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ABSTRACT 


Multiscale  techniques  such  as  the  wavelet  transform  have  become  a  power¬ 
ful  tool  for  image  compression,  denoising,  and  analysis.  This  thesis  outlines  the 
nse  of  such  techniques  for  developing  a  mnltiscale  processing  stream  for  the  pnr- 
pose  of  airborne  landmine  detection.  In  this  work,  the  Reed-Xiaoli  (RX)  mnlti- 
spectral  anomaly  detection  algorithm  is  extended  to  perform  detection  within  the 
shift-invariant  wavelet  representation  of  a  given  single-band  image.  A  test  statistic 
is  derived  and  shown  to  better  detect  anomalies  in  a  correlated  noise  backgronnd 
than  the  single-band  RX  algorithm.  The  resnlts  of  a  detection  algorithm  using  the 
shift-invariant  wavelet  representation  and  a  mnlti-layer  perceptron  neural  network 
are  also  discussed.  A  scale-space  image  registration  algorithm  is  presented,  where 
the  scale-invariant  featnre  transform  (SIFT)  has  been  used  to  search  for  control 
points  between  two  images.  The  use  of  SIFT  allows  for  image  features  invariant  to 
scaling,  translation  and  rotation  to  be  matched  in  featnre  space,  resulting  in  more 
accurate  image  registration  than  traditional  correlation-based  techniqnes. 
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1.  INTRODUCTION 


Multiscale  processing  has  become  popular  for  signal  and  image  analysis  in 
recent  times.  The  primary  focus  of  this  research  has  been  to  apply  multiscale  pro¬ 
cessing  techniques  with  applications  to  an  airborne  mine  detection  system.  Mine 
detection  research  aims  to  facilitate  cost-effective  humanitarian  demining  as  well  as 
to  gain  a  signihcant  advantage  in  tactical  offense  and  defense  capabilities.  A  wide 
variety  of  technologies  have  been  under  experimentation  in  order  to  develop  safe 
and  reliable  solntions  to  the  landmine  detection  problem.  This  thesis  addresses  the 
use  of  multiscale  techniques  in  two  important  areas  of  any  imaging-based  airborne 
mine  detection  system:  target  detection  and  image  registration. 

1.1.  AIRBORNE  LANDMINE  DETECTION 

Most  mine  and  mineheld  detection  technologies  can  be  classihed  as  either  being 
gronnd-based  or  airborne.  Gronnd-based  technologies  usnally  include  some  device 
which  scans  the  gronnd  with  some  electromagnetic  sensor.  This  sensor  can  be  either 
handheld,  or  monnted  on  a  vehicular  platform.  Specihc  details  of  such  ground-based 
mine  detection  platforms  can  be  found  in  [1]. 

The  major  drawback  to  any  ground-based  mine  detection  system  is  its  limited 
standoff  distance  and  large  scanning  time.  This  has  led  to  the  consideration  of  air¬ 
borne  systems  as  a  viable  alternative  to  gronnd-based  detection.  Collection  of  data 
over  very  large  areas  reqnires  the  use  and  development  of  antomatic  and/or  aided 
target  recognition  (ATR/AiTR)  systems.  Most  airborne  platforms  cnrrently  nnder 
research  employ  some  type  of  active  or  passive  electro-optical  sensing  technique  to 
gather  imagery.  Data  presented  in  this  thesis  have  been  collected  with  sensors  op¬ 
erating  in  the  mid-wave  infrared  (MWIR)  spectral  range.  More  recently  collected 
data  have  been  obtained  from  a  variety  of  sensors  operating  in  the  MWIR,  near 
infrared,  and  the  visible  red,  green,  and  blue  bands  of  the  freqnency  spectrum.  This 
newer  data  is  cnrrently  held  under  a  non-disclosnre  agreement  and  is  not  included 
in  this  thesis.  The  details  of  several  types  of  imaging  sensors  can  be  fonnd  in  [2]. 
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Some  of  the  U.S.  Department  of  Defense  programs  that  have  been  or  are  cur¬ 
rently  engaged  in  airborne  landmine  detection  systems  development  are  the  follow¬ 
ing:  Airborne  Mineheld  Detection  and  Reconnaissance  Systems  (AMIDARS)  [3], 
Remote  Mineheld  Detection  System  (REMIDS)  [4],  Airborne  Standoff  Mineheld 
Detection  System  (ASTAMIDS)  [3],  Coastal  Battleheld  Reconnaissance  and  Anal¬ 
ysis  System  (COBRA)  [5],  Airborne  Infrared  Measurement  System  (AIRMS)  [6], 
Airborne  Far  IR  Mineheld  Detection  System  (AFIRMIS)  [7],  and  the  Lightweight 
Airborne  Multispectral  Mineheld  Detection  System  (LAMD)  [8]. 

1.2.  AUTOMATED  ANOMALY  DETECTION 

Automatic  target  detection  is  an  important  step  in  any  landmine  detection  sys¬ 
tem.  When  any  signihcant  amount  of  imagery  has  been  collected  from  an  airborne 
platform,  it  must  be  processed  to  provide  the  most  relevant  and  meaningful  infor¬ 
mation  to  the  warhghter-in-the-loop  (WIL)  who  is  the  higher-level  decision  maker. 
Many,  if  not  most,  images  that  are  collected  in  this  process  have  no  mineheld  re¬ 
gions.  It  is  necessary  to  hlter  out  most  of  these  images  without  minehelds  so  that 
the  WIL  is  not  overwhelmed.  An  automated  method  to  determine  whether  relevant 
information  is  contained  within  a  given  image  must  be  employed. 

The  baseline  detection  algorithm  in  the  COBRA  program  is  the  Reed-Xialoi 
(RX)  algorithm  as  implemented  by  Holmes  in  [9].  The  RX  algorithm  is  also  the 
prime  candidate  for  implementation  as  the  anomaly  detector  in  the  LAMD  pro¬ 
gram.  Other  techniques  for  local  anomaly  detection  include  using  Gaussian-Markov 
random  helds  (GMRF)  [10]  and  stochastic  expectation-maximization  methods  [11]. 
There  is  also  signihcant  research  into  false  alarm  mitigation  (FAM)  techniques  [12] 
which  aim  to  reduce  anomalous  detections  which  are  not  likely  landmines.  Further 
high-level  processing,  such  as  patterned  mineheld  detection  [13]  and  line  detectors 
[14]  can  compliment  the  low-level  mine  detection  algorithm  to  identify  mineheld-like 
patterns  or  distributions. 

1.3.  AUTOMATED  IMAGE  REGISTRATION 

Image  registration  is  the  process  of  overlaying  two  or  more  images  of  the  same 
scene  taken  at  diherent  times,  from  diherent  viewpoints  and/or  by  diherent  sensors. 


3 


The  major  purpose  of  registration  is  to  remove  or  suppress  geometric  distortions  be¬ 
tween  the  reference  and  sensed  images,  which  are  introduced  due  to  different  imaging 
conditions,  and  thus  to  bring  images  into  geometric  alignment  [15].  For  the  airborne 
mine  and  mineheld  detection  problem,  image  registration  must  be  performed  in  order 
to  provide  a  montage  of  imagery  with  targets  for  mineheld  analysis  of  the  collected 
data.  The  registered  images  are  also  useful  for  the  war£ghter-in-the-loop  analysis 
of  mineheld  data.  Co-registration  between  diherent  bands  of  image  data  is  required 
for  sensor  fusion  and  multi-band  processing.  An  automated  registration  system  can 
perform  this  operation  with  little  or  no  human  interaction. 

1.4.  WAVELET  AND  SCALE-SPACE  TECHNIQUES 

Wavelet  based  signal  and  image  processing  has  been  employed  for  many  pur¬ 
poses  over  the  past  two  decades  [16].  One  of  the  most  recent  uses  of  wavelets  in 
image  processing  has  been  in  the  held  of  compression.  The  JPEG  2000  standard 
[17],  which  utilizes  the  wavelet  transform  to  provide  for  compression,  can  deliver 
remarkably  high  compression  rates  when  compared  to  its  predecessor’s  (JPEG)  dis¬ 
crete  cosine  transform  based  algorithm  without  drawbacks  such  as  size  constraints 
and  blocky  artifacts.  The  JPEG  2000  standard,  along  with  a  wavelet-diherence 
reduction  algorithm  [18]  are  considered  to  be  likely  possibilities  for  baseline  image 
compression  within  the  airborne  landmine  detection  programs. 

Whereas  the  literature  provides  copious  examples  of  the  use  of  the  wavelet 
transform  for  compression  and  denoising,  it  is  used  in  this  work  as  a  tool  for  signal 
analysis.  The  critically  sampled  wavelet  transform  finds  its  place  in  the  aforemen¬ 
tioned  fields,  but  is  surprisingly  a  poor  tool  for  extracting  or  identifying  shape  in¬ 
formation  from  a  given  signal  or  image  due  to  its  lack  of  shift-invariance.  This  work 
makes  use  of  the  shift-invariant  wavelet  transform,  in  which  a  highly  redundant  rep¬ 
resentation  arises  from  the  transformed  data  making  it  a  good  tool  for  analysis.  This 
type  of  wavelet  transform  is  used  for  the  anomaly  detection  algorithms  presented  in 
this  thesis. 

Scale-space  techniques  provide  a  method,  similar  to  wavelets,  for  analyzing 
structures  at  diherent  scales.  Rather  than  using  a  wavelet  to  build  a  wavelet- 
domain  description  of  a  function,  scale-space  employs  the  Gaussian  function  on 
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which  to  analyze  signals  and  imagery.  The  scale-space  in  effect  serves  as  a  represen¬ 
tation  that  attempts  to  mimic  the  mammalian  visual  cortex  [19].  In  this  thesis  the 
scale-invariant  feature  transform  (SIFT)  [20],  a  novel  technique  applying  scale-space 
theory  to  image  processing  and  object  recognition,  is  implemented  and  evaluated 
for  the  purpose  of  image  registration. 

1.5.  OVERVIEW  OF  THE  THESIS 

This  thesis  is  organized  into  hve  sections.  The  hrst  section  introduces  the 
reader  to  the  landmine  detection  problem  and  gives  a  brief  overview  of  the  topics 
that  have  been  discussed  in  detail  throughout  the  thesis.  The  wavelet  transform 
is  introduced  in  the  second  section.  The  lack  of  shift-invariance  in  critically  sam¬ 
pled  transforms  is  discussed  and  examples  are  presented.  The  stationary,  or  shift- 
invariant,  wavelet  transform  is  also  detailed  in  the  this  section.  The  third  section  is 
devoted  to  anomaly  detection  algorithms.  The  baseline  RX  algorithm  is  discussed 
in  detail,  and  then  used  to  derive  a  closely  related  algorithm  which  the  author  has 
called  the  multiscale  RX-based  algorithm.  This  algorithm  performs  anomaly  detec¬ 
tion  in  the  wavelet  domain.  A  multilayer  neural  network  is  then  presented  which 
is  also  used  to  perform  detection  on  stationary  wavelet  coefficients.  The  results  of 
both  detection  architectures  are  presented  and  discussed. 

The  fourth  section  details  the  use  of  scale-space  techniques  for  automated  im¬ 
age  registration.  The  basic  terminology  and  principles  of  image  registration  are  pre¬ 
sented,  along  with  the  current  correlation-based  method  of  control  point  selection. 
The  scale-invariant  feature  transform  is  presented  in  detail  along  with  discussion  of 
applying  the  algorithm  to  the  image  registration  problem.  The  thesis  is  concluded 
in  the  hfth  section  with  a  discussion  of  the  results  presented  and  relevant  future 
work. 
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2.  THE  WAVELET  TRANSFORM 


2.1.  DEFINITION  OF  THE  CWT 

The  wavelet  transform  of  a  continuous-time  signal  f{t)  with  respect  to  a  mother 
wavelet  ipit)  is  known  as  the  continuous-time  wavelet  transform  (CWT)  and  is  de¬ 
fined  as 

i'"'  (^) 

where  both  f(t)  and  'ip(t)  are  square-integrable  functions,  i.e.  f(t),'ip(t)  G  The 
wavelet  'ip(t)  has  the  additional  property  that 


W{s,u)  = 


'ip(t)dt  =  0. 


(2.2) 


Note  that  Eq.  2.2  implies  that  the  zeroth  (DC)  frequency  component  of  the  Fourier 
transform  of  '^(t)  is  zero,  thus  characterizing  ip{t)  as  a  bandpass  function.  Eq.  2.1 
can  be  written  more  compactly  by  dehning 

^  .  (2.3) 

Then 

/CXD 

fmtjmt.  (2.4) 

■oo 

Note  that  t/’i,o(^)  =  f’it)-  The  normalizing  factor  l/^/\s\  ensures  that  the  energy 
remains  unchanged  for  all  s  and  u,  i.e. 


|^s,n(t)|^dt  =  /  \i){t)\^dt. 


(2.5) 


The  parameter  s  represents  the  time  dilation  of  the  wavelet  'ip{t)  and  u  represents  its 
shift.  Specihcally,  for  a  given  value  of  s,  'ips,u(t)  is  a  shift  of  'ips,o(t)  by  an  amount  u 
along  the  time  axis.  'ips,o(t)  is  a  time-scaled  (and  amplitude-scaled)  version  of  'ipit). 
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If  the  wavelet  ipit)  satisfies  the  admissibility  condition 


a,,= 


[OJ 


-duj 


\0J\ 


(2.6) 


where  0  <  <  cx)  and  is  the  Fourier  transform  of  then  the  inverse 

CWT  exists  and  is  given  by 

-1  ^CXD  POO  1 

f{t)  =  yr  /  -^W{s,u)^jJs,u{t)  ds  du.  (2.7) 

O'ip  J  —CO  J  —  CO  ^ 

It  should  be  noted  here  that  a  wavelet  need  not  satisfy  the  admissibility  condition, 
and  thus  the  inverse  wavelet  transform  may  not  exist. 

The  CWT  can  be  viewed  as  an  operator  which  maps  a  square-integrable  func¬ 
tion  of  a  single  real  variable,  time,  into  a  function  of  two  real  variables,  scale  and 
translation.  This  wavelet  representation  is  a  function  of  all  possible  scales  and  trans¬ 
lations  of  the  mother  wavelet,  and  its  support  covers  the  entire  plane  of  This 
representation  is  inappropriate  for  numeric  computations,  and  is  highly  redundant 
in  the  sense  that  the  entire  continuum  of  wavelet  coefficients  is  not  needed  to  recon¬ 
struct  the  original  transformed  function.  It  is  then  not  unreasonable  to  search  for  a 
representation  of  the  form 

OO  OO 

f{t)=  ^  ^  d{j,k)ij{2-H  -  k),  (2.8) 

j  =  —  OQ  k=—oo 

where  the  values  d{k,l)  are  related  to  the  values  of  W{s,u)  at  s  =  2^  and  u  = 
2^1.  This  is  known  as  dyadic  sampling  of  the  CWT  since  the  consecutive  values 
of  the  discrete  scales  and  the  sampling  intervals  differ  by  a  factor  of  two.  The 
two-dimensional  sequence  d{k,l)  is  commonly  referred  to  as  the  discrete  wavelet 
transform  (DWT).  This  name  is  somewhat  misleading  since  this  is  still  the  transform 
of  a  continuous-time  signal  with  discretization  only  in  the  s  and  u  variables  of 
the  CWT.  The  same  name  will  be  used  later  for  wavelet  transforms  of  discrete¬ 
time  signals  or  discrete  images.  The  representation  in  Eq.  2.8  is  analogous  to  the 
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Fourier  series,  where  a  summation  of  weighted  sines  or  cosines  over  a  countable  set 
of  frequencies  is  used  to  construct  a  periodic  continuous-time  signal. 

2.2.  MULTIRESOLUTION  ANALYSIS 

While  there  are  two  ways  to  introduce  the  DWT,  with  the  CWT  being  one, 
the  method  popular  in  the  signal  processing  literature  dealing  with  wavelet  theory 
is  the  multiresolution  analysis  (MRA)  developed  by  Meyer  and  Mallat  [16].  The 
MRA  is  an  important  building  block  for  construction  of  orthogonal  wavelets  and 
development  of  the  DWT  and  hlterbank  algorithms.  Wavelet  representations  such 
as  those  in  Eq.  2.8  arise  naturally  in  the  context  of  an  MRA. 

2.2.1.  Definition.  An  MRA  provides  the  foundation  for  approximating 
functions  in  a  sequence  of  nested  linear  vector  spaces.  Not  every  sequence  of  nested 
vector  spaces  yields  an  MRA.  An  MRA  consists  of  a  sequence  {Vjjjez  of  closed 
subspaces  where  the  following  properties  are  satisfied: 

1.  is  invariant  to  any  translation  proportional  to  the  scale  2T 

fit)  e  V,  fit  -  2^k)  e  V„  V(j,  k)  e  (2.9) 

2.  A  sequence  of  vector  spaces  is  nested. 

V,  cv,_i,  VjeZ  (2.10) 

3.  Dilating  a  function  in  \ j  by  a  factor  of  two  guarantees  that  an 
approximation  is  dehned  at  a  coarser  resolution. 

/(t)  e  V,  /(2f)  G  V,_i,  VjeZ  (2.11) 

4.  The  intersection  of  these  subspaces  is  a  singleton  set  containing  the 
all-zero  function. 

OO 

Urn  Vj=  fl  V,  =  {0} 


(2.12) 


5.  The  union  of  these  subspaces  is  dense  in  the  space  of  square-integrable 
functions. 


lim  V, 

j->-oo 


Closure 


L^{R) 


(2.13) 


6.  There  exists  a  scaling  function  (fit)  such  that  {(fit  —  n)}n&  is  a 
Riesz  basis  of  Vq.  This  implies  that  there  exist  R  >  0  and  B  >  i) 
such  that  any  /  G  Vq  can  be  uniquely  decomposed  into 

OO 

f{t)=  ^  an(f{t-n)  (2.14) 

n=—oo 

where 

CXD 

A\\fr<  (2.15) 

n=—oo 

It  should  be  noted  that  there  are  alternate  logically  equivalent  axioms  that  can  be 
used  to  dehne  an  MRA. 

2.2.2.  Approximation  and  Detail  Subspaces.  From  the  above  def¬ 
inition  of  an  MRA,  a  complicated  function  can  be  divided  into  simpler  ones  that 
can  studied  separately.  The  scaling  function  of  Property  6  generates  the  approxi¬ 
mation  subspace  Vq.  Along  with  Properties  1  and  3,  the  scaling  function  satisfies 
the  dilation  equation 

hi[k](f{2t  —  k) ,  (2.16) 

k 

where  hi[k]  G  f'^(Z).  The  scaling  function  (fit)  is  a  superposition  of  translated 
and  scaled  copies  of  itself,  hence  its  name.  Eq.  2.16  is  sometimes  called  a  two- 
scale  relation,  since  it  relates  (f{t)  to  itself  at  an  adjacent  scale.  The  result  can  be 
generalized  to  relate  the  scaling  function  at  any  given  scale  to  the  next  scale  by 
replacing  t  with  2~H,  giving 


0(2  H)  =  '^^hi[k](fi2  ^h  —  k). 


(2.17) 
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The  subspace  Yj  is  generated  by  the  bases  —  /c)}j,fcez-  Since  is  a 

proper  subspace  of  Vj_i,  the  space  left  in  Vj_i  not  covered  by  Yj  is  labeled  Wj 
and  is  orthogonally  complimentary  to  Yj  in  Vj_i.  This  space  is  called  the  detail 
subspace.  The  subspace  is  generated  by  the  bases  —  k)}j^kez-  The 

wavelet  ip  also  obeys  a  two-scale  relation 

ip{t)  =  '^gi[k](j){2t- k),  (2.18) 

k 

and  satishes  the  properties  noted  earlier  -  it  is  square-integrable  and  bandpass.  Here 
the  wavelet  is  composed  of  dilated  and  translated  copies  of  the  scaling  function. 
Properties  of  the  sequences  hi[k]  and  gi[k]  will  soon  be  discussed  in  detail. 

Using  direct  sum  notation,  the  relations  between  the  approximation  subspace 
Vo  and  detail  subspace  Wq  are 

V_i  =  Vq  ©  ^Vo  and  Vq  H  Wq  =  {0},  (2.19) 

which  can  be  generalized  to 

Vj_i  =  ©  Wj  and  Yj  n  Wj  =  {0}.  (2.20) 

Figure  2.1  shows  the  relationship  between  the  approximation  and  detail  subspaces. 
Applying  Eq.  2.20  recursively  yields 

OO 

W-i  =  ©W„.  (2,21) 

n=j 

As  j  — >  —OO,  the  space  L^(M)  can  be  written  solely  in  terms  of  the  detail  subspaces: 

OO 

lim  V,  =  0  W,,  =  L\R). 

j^-OO 


n=— OO 


(2.22) 
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V_2  =  V_1  ©  W_1  =  Vo  ©  Wo  ©  W_1 


Figure  2.1  Relationship  between  approximation  and  detail  subspaces 


This  leads  to  the  representation  of  a  sqnare-integrable  fnnction  f{t)  by  a  weighted 
sum  of  wavelets  over  a  countable  set  of  translations  and  dilations,  i.e. 

OO  OO 

(2.23) 

j  =  —  OQ  k=—oo 


2.3.  DIGITAL  FILTERING  INTERPRETATION 

2.3.1.  Properties  of  Synthesis  and  Analysis  Filters.  Supposing  that 
an  orthonormal  MRA  is  to  be  constrncted,  the  scaling  fnnction  (j){t)  must  satisfy 
the  following  conditions: 


1.  The  scaling  function  integrates  to  one: 


1. 


(2.24) 


2.  It  has  unit  energy: 


1101 


|0(t)|^(it 


1. 


—  OO 


(2.25) 
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3.  The  integer  translations  of  (j){t)  are  orthogonal: 

(0(t),0(t-n))  =  (5(n).  (2.26) 

From  the  two-scale  relations  in  Eqs.  2.16  and  2.18,  and  along  with  the  conditions 
in  Eqs.  2.2  and  2.24,  the  sequences  hi{n)  and  gi{n)  then  satisfy 

J2hiin)  =  2,  (2.27) 

n 

and 

Y,9i{n)  =  Q.  (2.28) 

n 

These  properties  are  found  by  integrating  both  sides  of  the  two-scale  relations.  Eqs. 
2.27  and  2.28  imply  that  the  DC  components  of  the  discrete-time  Fourier  transform 
of  hi{n)  and  gi{n)  are  2  and  0  respectively.  Thus  hi{n)  is  a  lowpass  sequence  and 
gi{n)  is  bandpass  (or  highpass). 

Decomposition  hlters  can  now  be  dehned  from  the  sequences  hi{n)  and  gi{n). 
The  set  of  analysis  (decomposition)  hlters  is 

h{n)  =  hi{—n)  and  g{n)  =  gi{—n),  (2.29) 

and  the  synthesis  (reconstruction)  hlters  are 

h{n)  =  and  g{n)  =  (2.30) 

The  hlters  dehned  above  can  either  be  hnite  impulse  response  (FIR)  or  inhnite 
impulse  response  (HR)  hlters  depending  on  the  scaling  function  used  to  construct 
the  sequences  hi{n)  and  giiji).  Construction  of  these  sequences  from  0(t),  and  vice- 
versa,  is  discussed  in  detail  in  [16],  [21],  and  [22].  It  can  be  shown  [21]  that  these 
four  hlters  are  perfect  reconstruction  quadrature  mirror  hlters  (PRQMFs).  Each 
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filter  thus  satisfies  a  power  complementary  condition, 

|/i(a;)p  +  |/i(a;  +  7r)p  =  1, 
|h(a;)|^  +  |h(a;  +  7r)|^  =  1, 

|^Mr  +  |^(o;  +  7r)r  =  1, 
|lMp  +  ||(o;  +  7r)p  =  1, 


which  has  been  expressed  in  the  frequency  domain.  Relationships  between  the 
lowpass  and  highpass  hlters  can  also  be  presented.  These  follow  from  the  properties 
of  the  sequences  hi{n)  and  gi{n)  [21]. 


h{u)g*  {(jj)  +  h{(jj  +  Ti)g*  {ui  +  Ti)  =  Q  (2-31) 

h{ijj)g*  {oj)  +  h{ijj  +  Ti)g*  {oj  +  n)  =  Q  (2.32) 

The  time-domain  expressions  below  can  be  derived  from  those  above: 

'^^^h{n)h{n  +  2k)  = -5{k)^  (2.33) 

n 

g(n)g(n  +  2k)  =  ]^5(k),  (2.34) 

n 

'^^h(n)g(n  +  2k)  =  0.  (2.35) 

n 


Similar  properties  can  be  written  for  the  analysis  hlters  since  they  are  proportional  to 
the  rehections  of  h(n)  and  g(n).  The  equations  above  imply  that  the  LPF  response 
h(n)  is  orthogonal  to  even  translations  of  itself.  The  same  holds  for  the  HPF  impulse 
response.  Orthogonality  also  exists  between  the  LPF  response  and  even  translations 
of  the  HPF  response. 

2.3.2.  Synthesis  and  Analysis  with  PRQMFs.  Shown  in  Figure  2.2 
is  the  decomposition  and  reconstruction  paradigm  of  the  discrete-time  signal  f(n). 
This  provides  the  basis  for  signal  analysis  and  synthesis  using  the  wavelet  transform. 
The  input  signal  is  convolved  with  the  impulse  response  functions  of  the  lowpass  and 
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Decomposition  Reconstruction 

Figure  2.2  Signal  decomposition  and  reconstruction  with  PRQMFs 

highpass  decomposition  filters.  These  two  resulting  signals  are  then  downsampled 
by  a  factor  of  two  (2  |)  by  only  keeping  the  samples  with  even  indices.  The  se¬ 
quences  p  and  q  can  be  called  the  approximation  and  detail  coefficients  respectively. 
These  coefficients  can  then  be  upsampled  by  a  factor  of  two  (2  1)  by  inserting  zeros 
between  the  adjacent  samples.  The  upsampled  coefficients  are  then  convolved  with 
their  respective  lowpass  and  highpass  reconstruction  hlter  and  the  result  is  summed, 
yielding  the  original  signal  f{n). 

The  DWT  is  usually  computed  with  multiple  levels  of  approximation  and  detail 
coefficients.  The  cascade  hlterbank  used  to  accomplish  this  multilevel  decomposi¬ 
tion  is  shown  in  Figure  2.3.  The  reconstruction  hlterbank  is  shown  in  Figure  2.4. 
Figures  2.2,  2.3,  and  2.4  have  been  adapted  from  [21]. 


Figure  2.3  Two-level  DWT  analysis  hlterbank 
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Figure  2.4  Two-level  DWT  synthesis  filterbank 


2.4.  THE  SHIFT-INVARIANT  DWT 

Unfortunately,  for  some  purposes  of  signal  analysis,  the  critically  sampled 
DWT  does  not  inherit  the  important  property  of  shift-invariance.  This  means  that 
for  a  given  signature  in  a  signal,  the  wavelet  domain  representation  (approximation 
and  detail  coefficients)  will  not  remain  invariant  to  any  shift  of  the  signature.  An 
example  is  shown  in  Figure  2.5.  The  signature  in  question  is  the  rectangular  pulse. 
The  signal  and  its  wavelet  transform  have  been  obtained  (using  the  DB4  wavelet) 
and  are  shown  on  the  left.  The  signal  is  then  delayed  by  one  sample  in  time  and  its 
wavelet  transform  coefficients  are  shown  on  the  right.  As  can  be  seen  in  the  hgure, 
the  representation  is  entirely  different  under  the  shift  of  the  pulse.  This  difference 
is  especially  prominent  in  the  detail  coefficients  qi  and  q2- 

In  order  to  correct  for  the  discrepancy  in  the  wavelet  transform  of  the  two 
signals,  the  shift-invariant  DWT,  also  called  the  stationary  DWT,  can  be  used  to 
decompose  the  signal  into  its  approximation  and  detail  coefficients.  Orthogonal¬ 
ity  of  the  detail  and  approximation  coefficients  is  lost  under  this  operation,  but 
nevertheless  it  provides  a  good  tool  for  analyzing  signals  at  various  scales.  Shown 
in  Figure  2.6  are  the  previous  two  signals  and  their  respective  wavelet  coefficients 
obtained  with  the  shift-invariant  wavelet  transform  (using  the  same  DB4  wavelet). 
The  wavelet  domain  representation  of  the  second  signal  is  simply  a  single-sample 
translation  of  the  hrst  signal.  Note  from  the  hgure  that  as  compared  to  the  crit¬ 
ically  sampled  transform,  where  the  number  of  wavelet  coefficients  is  equal  to  the 
number  of  signal  samples,  the  shift-invariant  transform  is  highly  redundant.  Each 
level  of  representation  has  the  same  number  of  coefficients  as  the  original  signal. 
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This  is  due  to  the  fact  that  the  downsampling  operation  has  been  removed  from 
the  decomposition  hlterbank.  This  transform  would  not  be  appropriate  solely  for 
purposes  of  compression,  but  has  certainly  found  applications  in  the  area  of  signal 
denoising.  For  signal  transmission  purposes,  redundancy  can  be  partly  mitigated 
via  source  coding  techniques. 
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Figure  2.5  Two  similar  signals  and  their  critically  sampled  DWTs 


The  shift-invariant  wavelet  transform  coefficients  are  obtained  in  a  similar 
fashion  to  that  of  the  DWT,  with  some  exceptions.  As  mentioned  earlier,  the  down¬ 
sampling  operation  is  removed  from  the  decomposition  hlterbank.  If  reconstruction 
is  to  be  performed,  the  upsampling  operations  are  removed  from  the  reconstruc¬ 
tion  hlterbank.  Another  major  diherence  is  that  while  the  synthesis  and  analysis 
hlters  remain  the  same  throughout  the  hlterbanks  for  obtaining  the  DWT,  they  in 
fact  change  at  every  level  of  the  hlterbank  for  the  shift-invariant  transform.  At 
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Figure  2.6  Two  similar  signals  and  their  shift-invariant  DWTs 


each  level,  the  hlters  are  subjected  to  an  upsampling  operation.  The  decomposition 
paradigm  is  depicted  in  Figure  2.7. 


Figure  2.7  Decomposition  hlterbank  for  the  shift-invariant  wavelet  transform 
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2.5.  TWO-DIMENSIONAL  ANALYSIS 

The  signals  studied  in  this  work  are  two-dimensional  images,  and  the  wavelet 
transform  must  be  obtained  for  them.  Fortunately,  the  two-dimensional  wavelet 
transform  is  essentially  a  collection  of  one-dimensional  transforms.  A  two  dimen¬ 
sional  scaling  function  and  collection  of  two  dimensional  wavelets  hrst  need  to  be 
dehned.  A  separable  2D  scaling  function  is  simply  a  product  of  two  ID  scaling 
functions: 

(j){x,y)  =  (j){x)(l){y).  (2.36) 

The  2D  wavelets  are  constructed  from  the  ID  scaling  function  and  wavelet,  i.e. 

i^Hix,y)  = 'i/jix)(j){y),  (2.37) 

i^vix,y)  =  (j)ix)'i/j{y),  (2.38) 

and 

i^D{x,y)  =  ^p{x)^lJ{y).  (2.39) 

Some  two-dimensional  wavelets  used  for  image  analysis  are  not  separable.  Separabil¬ 
ity  reduces  computational  complexity  dramatically,  since  the  two-dimensional  trans¬ 
form  is  just  a  straightforward  extension  of  the  one-dimensional  transform.  The  no¬ 
tation  i/jh,  'ipV)  and  used  above  denotes  the  directional  sensitivity  of  the  wavelets. 
These  wavelets  measure  functional  variations-intensity  or  gray-level  variations  for 
images-along  different  directions.  tjjH  measures  variations  along  columns  (e.g.  hori¬ 
zontal  edges),  ipv  measures  variations  along  rows,  and  ifjo  corresponds  to  variations 
along  diagonals.  These  scaling  functions  and  wavelets  are  not  actually  used  to  derive 
LPF  and  HPF  analysis  and  synthesis  sequences,  but  instead  they  are  the  effective 
2D  wavelets  that  would  be  used  if  one  chose  not  to  use  the  fact  that  the  wavelets 
are  separable. 

The  analysis  hlterbank  for  decomposing  the  discrete  image  f{m,n)  is  shown 
in  Figure  2.8.  Only  one  iteration  of  the  decomposition  is  shown.  Further  levels 
of  the  decomposition  hlterbank  would  operate  on  the  approximation  coefficients 
p{m,  n).  The  2D  shift-invariant  wavelet  transform  hlterbank  is  similar  to  that  shown 
above  except  for  the  absence  of  the  downsamplers.  Figure  2.9  shows  a  single-level 
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Figure  2.8  Decomposition  filterbank  for  the  two-dimensional  DWT 


decomposition  filterbank.  The  hlters  would  be  upsampled  at  each  iteration  of  the 
hlterbank.  An  example  of  2D  critically  sampled  wavelet  transform  coefficients  for 
a  typical  image  is  shown  in  Figure  2.10.  The  corresponding  2D  stationary  wavelet 
coefficients  are  shown  in  Figure  2.11 


f{m,n) 


qv{l,m,n) 
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Figure  2.9  Decomposition  hlterbank  for  the  two-dimensional  shift-invariant  DWT 
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Figure  2.10  An  image  and  its  critically  sampled  wavelet  transform  coefficients 
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Figure  2.11  An  image  and  its  stationary  wavelet  transform  coefficients 
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3.  ANOMALY  DETECTION  ALGORITHMS 


Anomaly  detection  is  one  of  the  most  critical  components  of  any  airborne  land¬ 
mine  detection  platform.  Using  wavelet  transform  data  for  target  detection  within 
digital  imagery  is  a  rather  new  and  unexplored  research  area,  and  the  literature  is 
fairly  sparse.  Recent  developments  include  those  of  Zhang  and  Desai  [23],  who  have 
presented  a  target  segmentation  algorithm  incorporating  multiresolution  analyses 
of  images  and  their  probability  density  functions  (PDFs)  with  a  Bayes  classiher. 
Sadjadi  [24]  has  developed  a  clustering  algorithm  to  segment  targets  from  clutter 
using  a  multiresolution  texture-based  methodology,  again  incorporating  the  PDFs 
of  wavelet  decomposition  subbands. 

The  current  baseline  anomaly  detector  is  the  Reed-Xiaoli  (RX)  algorithm  [25] . 
This  section  introduces  the  RX  algorithm,  and  extends  the  multispectral  RX  algo¬ 
rithm  to  detect  anomalies  via  the  stationary  wavelet  decomposition  of  single-band 
imagery.  Results  for  synthetic  and  real  sensor  data  are  presented  and  discussed. 
Another  anomaly  detection  scheme  using  a  multilayer  feed-forward  neural  network 
and  stationary  wavelet  coefficients  is  also  presented. 

3.1.  THE  REED-XIAOLI  ALGORITHM 

The  RX  algorithm  is  a  constant  false  alarm  rate  (CFAR)  detection  algorithm 
which  tests  for  the  presence  of  a  known  optical  or  IR  signal  pattern  which  has 
unknown  relative  intensities  within  several  signal-plus-noise  channels  [25].  The  al¬ 
gorithm  is  derived  as  a  generalized  maximum  likelihood  ratio  test  (LRT)  used  for 
detecting  anomalous  image  regions  given  a  single-band  or  multiband  image.  The 
test  assumes  that  statistical  model  for  optical  or  IR  clutter  images  follows  a  white 
Gaussian  probability  distribution  function  (PDF).  It  is  also  assumed  that  the  local 
mean  can  change  from  region  to  region,  while  the  covariance  varies  slowly  across 
the  image.  This  algorithm  is  the  result  of  the  previous  work  found  in  [26]  and  [27]. 
The  development  of  the  algorithm  here  follows  the  notation  in  [25]. 

3.1.1.  Derivation  of  the  RX  Statistic.  The  hrst  step  of  the  RX  algorithm 
is  to  remove  the  nonstationary  local  mean  from  each  of  the  J  bands  of  the  multiband 
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image.  Each  plane  of  the  image  X  is  convolved  with  an  all  ones  window  of  size  LxL, 
denoted  W.  The  local  mean  is 


X  =  ^[X*W].  (3.1) 

The  size  of  the  window  L  can  be  chosen  so  that  the  third  moment  of  the  residual 
image  Xq  =  X  —  X  is  minimized.  This  results  in  an  image  with  an  approximately 
Gaussian  PDF. 

Next,  let  the  column  vectors  x(n)  =  [xi{n),  X2{n),  -  ■  ■  ,  xj{n)Y  for  n  =  1, . . . ,  iV 
and  N  >  J  represent  the  J  correlated  scenes  (from  the  de-meaned  image  planes 
of  Xq)  which  may  or  may  not  contain  a  target  with  known  shape.  The  known 
signal  pattern  is  represented  as  S  =  [s(l),  s(2), . . . ,  s(iV)].  Here,  the  two  dimen¬ 
sional  subimages  representing  xi,X2,  ■  ■  ■  ,xj  and  S  have  been  lexicographically  or¬ 
dered  into  one  dimensional  row  vectors.  A  signal  intensity  vector  is  then  dehned 
as  b  =  [bi,b2,  ■  ■  ■  ,bj]'^.  This  is  the  vector  of  unknown  signal  (target)  intensities  in 
each  of  the  J  bands. 

The  LRT  must  distinguish  between  the  following  binary  hypotheses: 


i/o  :  X  =  Xo 

;  X  =  Xo  +  bS, 

where  xq  is  the  vector  of  the  residual  clutter  process,  which  is  assumed  to  be  ap¬ 
proximately  independent  and  Gaussian  from  pixel  to  pixel.  This  assumption  arises 
from  the  fact  that  many  IR  images,  after  subtraction  of  the  local  nonstationary 
mean,  have  covariance  matrices  which  can  be  approximated  by  diagonal  matrices. 
The  covariance  matrix  at  a  pixel  location  (m,n)  in  the  residual  image  Xq  can  be 
estimated  from  the  subimage  surrounding  the  pixel  location  with 

M  =  E  |[x(n)  —  Ex(n)]  [x(n)  —  Ex(n)]^|  .  (3.3) 

The  parameter  space  for  this  LRT  is  dehned  as 


(3.2) 


n  =  {[b,M]  ;  |M|  >  0}  . 


(3.4) 
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Here  it  is  assumed  that  all  covariance  matrices  encountered  are  positive  definite. 
The  likelihood  function  is  given  by 

1  ^ 

—  -  ^  (x(n)  —  Ex(n))'^  (x(n)  —  Ex(n))  . 

n=l 

(3.5) 

Letting  X  =  [x(l),  x(2), . . . ,  x(iV)],  a  J  x  N  matrix  of  the  image  data,  the  likelihood 
can  be  written  as 


L(b,M)  = 


27r^J/2|M|JV/2 


exp 


L(b,M)  = 


27r^'^/2|M|^/2 


exp  <  —  Tr 


M-i  (X  -  EX)  (X  -  EX)' 


(3.6) 


The  parameter  space  is  divided  into  two  complimentary  spaces,  u  and  hi  \  cn, 
and  the  hypotheses  are  defined  as  follows 


Ho  =  [b,  M]  ecu 
Hi  =  [h,M]  e  n  \  ou 


where 

cu  =  {[0,M]  :  |M|  >  0} 
n\ou  =  {[b,M]  :  |M|  >  0,b  ^  0}. 


The  null  hypothesis  Hq  corresponds  to  no  target  being  present  in  the  given  subimage, 
while  Hi  implies  the  presence  of  a  target.  The  maximum  likelihood  ratio  test  is  then 
given  by 


max  L(b,  M) 

.  b,MeO\a;  , 

A(X)  =  ^ ^  k 


max  T(b,  M) 


b.MGc*; 


H,. 


(3.8) 


The  maxima  of  both  likelihood  functions  under  each  hypothesis  are  computed  as 


and 


max  L(h,  M) 

b,MGO\a3 


1  (^'] 

27r^A2|M^,|7v/2  2  J 


(3.9) 


max  L(h,  M) 

b,Mei.J 


1 

27r^A2|Mo|(v/2  1^  2  )' 


(3.10) 
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The  maximum  likelihood  estimates  (MLE’s)  of  the  unknown  parameters  under  each 
hypothesis  are 

1  ^  1 
n=l 

and 

1  ^  1 

Mb  =  ^  Xb(n)Xb^(n)  =  -(X  -  bS)(X  -  bS)^,  (3.12) 

n=l 


where 


(3.13) 


Note  that  is  a  scalar.  Substituting  Eqs. 


3.9  and  3.10  into  Eq.  3.8  yields  the 


LRT 


A(X) 


Hi 

Ho 


(3.14) 


An  equivalent  test  can  be  written  as 


A(X) 


|Mo| 

iMhI 


Hi 

^  c  , 

Ho 


(3.15) 


where  c  =  .  Substituting  Eqs.  3.11  and  3.12  into  Eq.  3.15  and  simplifying 

yields  the  explicit  test 


A(X) 


IXX^I 

xx^-- 

(XS^)(XS^)'^ 

ss^ 

Hi 

^  c, 
Ho 


(3.16) 


where  the  likelihood  ratio  can  be  further  simplihed  to 


A(X) 


1 


,  (XS^)'^(XX'^)-1(XS'^) 

^  ss^ 
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finally  producing  the  equivalent  test 


r(X) 


(XS^)'^(XX^)-i(XS^) 

SS^ 


Hi 

^  ro  . 
Ho 


(3.17) 


The  derivation  of  the  distribution  of  the  statistic  under  the  null  and  non-null 
hypothesis  is  not  outlined  here,  but  can  be  found  in  [25].  The  generalized  signal-to- 
noise  ratio  (GSNR),  needed  to  characterize  the  distributions,  is  dehned  as 


GSNR  =  a=  (b'^M-ib)||S||2  . 


(3.18) 


The  PDF  of  the  test  statistic  r  under  the  non-null  hypothesis  is  given  by  the  non¬ 
central  F  distribution 


f\r\Hi) 


N-J-2  J-2  _a 

(1-r)  2  r  2  e 


/iV  J  ar\ 

VY’ 2’y ) 


(3.19) 


where  0  <  r  <  1  and  iRi(a;  6;  x)  is  the  confluent  hypergeometric  function.  Under  the 
null  hypothesis  no  signal  is  present  (a  =  0),  and  the  PDF  reduces  to  the  standard 
beta  distribution 


f{r\Ho) 


(3.20) 


again  where  0  <  r  <  1.  The  receiver  operating  characteristic  (ROG)  can  be  com¬ 
puted  from  these  PDF’s  with  the  following  equations: 


Pfa 


f{r\Ho)dr  ■,  Pd  =  f{r\Hi)dr, 


>ro 


I  ro 


(3.21) 


where  Pfa  is  the  probability  of  a  false  alarm  and  Pd  is  the  probability  of  detection. 
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3.1.2.  RX  Algorithm  Implementation.  The  RX  algorithm  requires 
that  the  detection  statistic  in  Eq.  3.17  be  calculated  for  each  pixel  location  in  the 
original  single  or  multi-band  image.  The  algorithm  can  be  implemented  either  in  the 
spatial  domain  via  2-D  convolution,  or  in  the  frequency  domain  with  the  2-D  fast 
Fourier  transform  (EFT).  Each  method  has  its  own  advantages  and  disadvantages. 
Regardless  of  the  method  chosen,  the  inverse  of  a  J  x  J  covariance  matrix  must  be 
computed  at  each  pixel  location.  Clearly  this  is  a  simple  task  for  single  band  imagery, 
bnt  compntational  complexity  dramatically  increases  as  J  grows  larger,  making  the 
RX  algorithm  highly  inappropriate  for  detecting  anomalies  in  hyperspectral  imagery 
where  it  is  not  nncommon  to  encounter  J  >  100  [28]. 

Figures  3.1  and  3.2  show  two  possible  sets  of  masks  to  use  in  calculating  the 
target  mean  (b)  and  covariance  (M)  data.  Figure  3.1  shows  a  set  of  square  masks 
where  the  central  shaded  sqnare  {Wt)  is  the  target  mask,  and  the  onter  shaded 
“sqnare  annulus”  {Wc)  is  the  covariance  mask.  The  white  space  between  the  masks 
represents  a  “blanking”  region,  which  does  not  contribnte  to  the  calculation  of  the 
statistic.  This  region  can  contain  shadows  or  other  reflective  effects  of  the  target 
signatnre,  which  can  corrnpt  the  target  mean  and/or  covariance  estimates  [29].  Each 
non-zero  weight  in  each  mask  has  a  value  of  unity.  The  square  masks  with  constant 
weight  are  very  efficient  for  estimating  the  target  mean  and  covariance  data  via 
convolution.  Since  recursive  implementation  of  the  convolntion  snm  is  possible,  this 
obviates  the  need  to  repeatedly  compnte  the  snm  of  pixel  valnes  for  the  inner  regions 
of  the  mask  as  it  is  convolved  with  the  image.  Only  the  valnes  on  the  edges  of  the 
mask  need  to  be  considered  and  added  or  snbtracted  as  needed,  drastically  redncing 
compntational  cost  per  pixel.  The  final  convolution  result  is  divided  by  the  number 
of  the  non-zero  pixel  valnes  in  the  mask,  thus  the  effective  mask  has  constant  pixel 
valnes  that  snm  to  nnity. 

Estimation  of  target  mean  and  covariance  data  with  RX  masks  having  more 
complex  geometries  usually  requires  the  use  of  fast  Fourier  transform  (EFT)  based 
convolution,  since  compntational  complexity  becomes  too  high  with  brnte-force  con¬ 
volntion.  The  masks  shown  in  Fignre  3.2  have  the  same  attributes  as  the  square 
masks,  except  for  their  geometry.  There  are  three  radii  noted  in  the  hgnre.  Rt  is 
the  radius  of  the  target  mask  used  to  estimate  the  target  mean.  Rb  is  the  blanking 
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Figure  3.1 


Square  RX  masks  for  calculation  of  target  mean  and  covariance  data 


Rb 


Figure  3.2  Circular  RX  masks  for  calculation  of  target  mean  and  covariance  data 
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radius  used  to  define  the  area  of  the  inner  circle  excluded  from  the  covariance  mask. 
Rd  is  the  demeaning  radius,  which  helps  to  define  two  masks:  the  mask  used  to 
demean  each  band  of  the  original  image,  and  the  covariance  mask.  The  covariance 
mask  has  a  radial  width  of  Rn  —  Rb-  Circular  masks  are  adept  at  producing  well 
clustered  RX  statistics  for  high-contrast  circular  regions  such  as  landmines,  and  the 
detection  statistic  is  invariant  to  rotation;  however,  the  complex  geometry  of  circular 
masks  calls  for  use  of  the  FFT  to  calculate  the  mean  and  covariance  data. 

Shown  in  Figure  3.3  is  a  representative  single-band  image  of  a  portion  of  a 
minefield.  The  output  of  the  RX  algorithm  is  shown  in  Figure  3.4  where  circular 
masks  have  been  used  with  parameters  Rt  =  2,  Rd  =  20  and  Rb  =  7-  After  an 
image  has  been  processed  with  the  RX  algorithm,  the  output  is  then  subjected  to 
non-maximal  suppression.  This  is  an  algorithm  where  pixels  values  that  do  not 


Figure  3.3  A  representative  LAMD  single-band  image  with  portion  of  patterned 
minefield 


Figure  3.4  Output  of  the  RX  algorithm  for  image  in  Figure  3.3 


exceed  a  certain  threshold  are  hrst  set  to  zero.  The  remaining  pixels  are  then 
pruned  around  local  maxima  until  a  singlular  value  remains.  This  value  is  then 
used  to  determine  the  location  and  amplitude  of  the  RX  output  for  each  anomaly. 
The  distributions  of  the  detection  statistics  after  non-maximal  supression  of  the  RX 
output  has  been  studied  in  [30] .  Non-maximal  suppression  is  also  used  for  anomaly 
localization  in  the  following  algorithms. 

3.2.  A  MULTISCALE  RX-BASED  ALGORITHM 

The  multiband  RX  algorithm  is  extended  here  to  detect  anomalies  within  the 
wavelet  coefficients  of  single-band  imagery.  The  work  presented  here  shows  that 
the  detection  performance  of  this  algorithm  is  higher  than  that  of  the  single-band 
RX  algorithm  for  anomalies  in  a  correlated  noise  background.  Better  detection 
performance  is  thought  to  occur  for  two  reasons:  the  nature  of  the  wavelet  transform 
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to  decorrelate  signals,  and  the  improvement  in  the  theoretical  ROC  curves  of  the 
RX  algorithm  when  operating  under  multiple  bands  (multiple  scales  in  this  case). 

Reed  et  ai,  [31]  have  proposed  a  multi-spectral  anomaly  detector  using  the 
wavelet  transform,  although  the  framework  under  which  it  is  developed  is  quite 
different  than  the  multiscale  RX-based  algorithm  discussed  below.  Whereas  Reed 
develops  a  generalized  likelihood  ratio  test  which  operates  only  on  one  wavelet  sub¬ 
band  at  a  time,  the  test  developed  below  operates  on  all  of  the  wavelet  subband  data 
available  to  it  at  once.  Reed’s  previous  work  addresses  the  use  of  critically  sampled 
transform  coefficients  on  which  to  perform  detection.  Here,  a  stationary  wavelet 
transform  is  used  to  obtain  the  wavelet  coefficients  used  for  processing.  This  allows 
for  shift-invariance  in  the  wavelet-domain  representation  due  to  the  redundancy  of 
the  coefficients  caused  by  the  absence  of  downsamplers  in  the  decomposition  hlter- 
bank.  The  wavelet  used  in  most  of  these  simulations  is  the  Haar  wavelet,  since  it 
introduces  the  least  amount  of  translation  in  the  coefficients.  Other  wavelets  can 
be  used,  but  those  with  a  long  support  should  be  avoided  to  keep  translation  to  a 
minimum.  The  development  of  the  algorithm  follows  the  same  reasoning  as  the  RX 
algorithm  that  was  discussed  in  the  previous  section,  however  due  to  the  nature  of 
the  wavelet  coefficients  the  mathematical  derivation  is  slightly  different. 

3.2.1.  Derivation  of  the  Test  Statistic.  The  detector  must  distinguish 
between  the  following  hypotheses: 


^0  :  X  =  Xo 

i/i  ;  X  =  Xo  +  BS, 


(3.22) 


where  X  represents  the  lexicographically  ordered  wavelet  coefficients  of  different 
subbands.  X  is  of  size  J  x  N  where  there  are  J  wavelet  decomposition  subbands 
upon  which  the  test  operates,  and  N  pixel  locations  within  the  target  window.  The 
MLE  of  the  covariance  matrix  Mo  under  the  null  hypothesis  is  the  same  as  that 
given  in  Eq.  3.11,  while  the  MLE  of  the  covariance  matrix  Mb  under  the  non- null 
hypothesis  is 


MB  =  l(X-BS)(X-BSf, 


(3.23) 
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where  the  target  mean  matrix  B  is  given  by 

B  =  (XS^)(SS^)-h  (3.24) 

B  can  be  thought  of  as  a  mixing  matrix,  or  correlation  matrix.  In  contrast  to  the 
previous  formulation  of  the  RX  maximum  likelihood  estimates,  B  is  a  J  x  J  matrix 
instead  of  a  J-length  column  vector.  S  is  a  J  x  iV  template  matrix,  in  contrast 
to  an  7V-length  row  vector  as  previously  dehned.  S  must  take  into  account  that 
a  target  signature  is  decorrelated  throughout  the  subbands,  and  cannot  take  on  a 
single  representation  as  in  the  RX  algorithm.  One  possible  mechanism  to  calculate 
the  template  signature  S  is  discussed  in  section  3.2.2. 

The  likelihood  ratio  test  remains  the  same  as  developed  in  the  previous  section 
as  far  as  it  is  the  ratio  of  the  determinants  of  the  covariance  estimates  under  each 
hypothesis: 

.  Hi 

A(X)  =  ^^^1  >  c.  (3.25) 

|Mb|  < 

Ho 

This  maintains  the  interpretation  that  the  likelihood  ratio  A(X)  is  the  ratio  of  the 
hypervolumes  of  the  parallelepipeds  spanned  by  the  column  vectors  of  the  covariance 
matrices  under  each  hypothesis.  Substituting  Eq.  3.23  into  Eq.  3.22  and  expanding 
yields 


iVMB  =  XX^-XS'^(SS^)-^SX^-X[XS^(SS^)-^S]^ 
+  XS'^(SS^)-^S[XS^(SS^)-^S]'^. 


Under  the  properties  of  the  matrix  transpose  and  using  the  fact  that  the  inverse 
of  the  symmetric  matrix  SS^  is  symmetric,  the  covariance  matrix  under  Hi  can  be 
simplihed  to 


TVMb  =  xx^  -  (XS^)(SS^)-^(XS^)^. 


(3.26) 
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Substituting  Eqs.  3.11  and  3.25  into  Eq.  3.24  gives  the  explicit  test 

A(X)  =  ^ ^ — ; — L 
^  ^  |XX^-(XS^ 

3.2.2.  Target  and  Covariance  Masks.  Since  target  signatures  are 
decorrelated  throughout  the  wavelet  subbands,  the  use  of  a  single  target  mask  is  not 
appropriate  for  calculation  of  the  target  mean.  A  set  of  target  masks  is  calculated 
for  the  expected  wavelet  representation  of  the  anomaly  at  each  scale  and  orientation. 
Described  here  is  the  methodology  for  choosing  the  target  template  matrix  S  used 
to  calculate  the  test  statistic. 

In  the  spatial  domain  the  target  shape  is  assumed  to  be  circular,  but  may  lie  in 
different  poses  containing  various  subpixel  shifts  as  shown  in  Figure  3.5.  The  wavelet 
decomposition  of  the  target  in  each  pose  is  obtained,  and  then  the  coefficients  are 
averaged  over  the  different  possible  poses.  This  gives  an  expected  target  mask  for 
each  scale  and  orientation.  The  expected  wavelet-domain  masks  for  a  single-level 
Haar  decomposition  of  target  signatures  under  9  poses  with  Rt  =  3  is  shown  in 
Figure  3.6. 
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Figure  3.5  A  target  with  Rt  =  2  in  with  various  subpixel  shifted  positions 


The  covariance  mask  that  produces  the  best  detection  performance  was  found 
to  be  a  simple  square  mask,  in  contrast  to  the  annular  masks  discussed  in  the 
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(a)  LL  (b)  LH  (c)  HL  (d)  HH 

Figure  3.6  Expected  wavelet-domain  target  masks  over  9  poses  of  a  target  with 
Rt  —  3 


previous  section.  It  is  the  author’s  opinion  that  a  square  mask  inclusive  of  the 
target  produces  the  best  output  since  the  determinant  of  the  covariance  under  Hi 
can  be  thought  of  as  a  metric  which  is  a  measure  of  the  distance  between  the 
estimated  covariance  and  the  image  data  under  the  influence  of  the  target  mask 
matrix  S.  When  the  distance  is  relatively  small,  the  estimated  covariance  is  close 
to  the  covariance  of  the  target  mask,  and  thus  the  eigenvalues  of  Mb  are  small  with 
respect  to  those  of  Mq  which  in  turn  yields  a  high  likelihood  ratio  A(X). 

3.2.3.  Algorithm  Implementation.  The  multiscale  RX-based  algorithm 
is  implemented  in  the  frequency  domain  via  the  FFT,  since  the  masks  used  to 
compute  the  target  mean  have  a  rather  complex  geometry.  The  algorithm  is  outlined 
below: 

1.  The  first  step  of  the  algorithm  is  to  precalculate  the  target  masks 
S  for  a  given  target  radius  as  outlined  above.  The  Fourier  trans¬ 
forms  of  the  masks  are  calculated  and  labeled  Si{u,v).  The  inverse 
covariance  matrix  (SS^)“^  of  the  target  masks  is  precalculated  as 

well. 

2.  The  second  step  in  the  algorithm  is  to  obtain  the  stationary  wavelet 
coefficients  for  the  given  image.  A  single  level  decomposition  is  used 
to  obtain  the  coefficients.  Two-level  and  three-level  decompositions 
can  be  used,  but  contribute  to  higher  computational  complexity 
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with  only  slight  performance  gains.  The  Haar  wavelet  hlters  have 
so  far  provided  the  best  detection  results.  The  coefficients  can  be 
labeled  X  where  Xi{x,  y)  is  the  ith  subband  of  the  coefficient  data. 

3.  Once  the  decomposition  has  been  obtained,  the  approximation  sub¬ 
band  undergoes  removal  of  its  local  mean  using  a  circular  window 
of  a  size  20  pixels.  Since  the  detail  subbands  are  zero-mean,  no 
demeaning  of  them  is  necessary.  The  demeaned  output  is 

^Aix,y)  =  XA{x,y)  -  X~'^{X{XA{x,y)}d{u,v)},  (3.28) 

where  d{u,  v)  is  the  Fourier  transform  of  the  circular  window. 

4.  Local  covariance  estimates  are  then  computed  for  the  wavelet  sub¬ 
band  data.  The  covariance  mask  used  for  this  purpose  has  rect¬ 
angular  geometry.  The  dimensions  used  here  are  32  x  32.  The 
covariance  images  are  calculated  as 

Mij{x,  y)  =  X~^{X{Xi{x,  y)Xj{x,  y)}m{u,  u)},  (3.29) 

where  rh{u,v)  is  the  covariance  mask  in  the  frequency  domain. 
Mij{x,y)  is  the  ijth  entry  of  the  covariance  matrix  M  at  pixel 
location  {x,y). 

5.  Target-mean  data  is  computed  for  each  subband  via  correlation 
with  the  target  masks: 


Tij{x,y)=X  ^{X{Xi{x,y)}s*{u,v)},  (3.30) 


where  Tij  is  the  ijth  entry  of  the  matrix  T  =  XS^. 

6.  An  output  statistic  A  is  then  calculated  for  each  pixel  location  in 
the  original  input  image  with  the  equation 


A 


|M| 

M-T(SS’’)“'T’’|' 


(3.31) 
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7.  The  output  image  \{x,y)  is  subjected  to  thresholding  and  non- 
maximal  suppression.  The  remaining  pixel  locations  and  values  are 
put  into  a  list  of  target  locations  and  ordered  by  likelihood. 

3.2.4.  Results  for  Various  Imagery.  The  above  algorithm  has  been 
tested  on  both  synthetic  and  real  sensor  data.  The  synthetic  imagery  allows  for 
a  straightforward  comparison  of  the  multiscale  algorithm  with  the  RX  algorithm. 
Each  synthetic  image  is  of  size  128  x  128,  has  a  noise  variance  of  =  25  pixels,  and 
has  targets  {Rt  =  1)  populated  throughout  with  random  amplitudes  each  with  a 
possible  SNR  of  2,  5,  10,  15,  and  20.  The  SNR  here  is  the  ratio  of  target  amplitude 
to  noise  amplitude.  The  background  has  been  snbjected  to  a  lowpass  3x3  box  hlter 
in  order  to  correlate  the  noise.  Two  typical  test  images  are  shown  in  Fignre  3.7. 
The  overall  results  of  the  multiscale  algorithm  and  RX  algorithm  are  shown  in 
Figure  3.8.  Results  for  various  SNRs  are  shown  in  Figures  3.9,  3.10,  3.11,  and  3.12. 
The  algorithm  has  been  tested  with  typical  LAMD  mineheld  dataset  as  well.  The 
results  are  shown  in  Fignre  3.13. 


Figure  3.7  Two  typical  test  images  used  for  testing  of  the  multiscale  RX  algorithm 


■  MSRX  -  sim10(155),sim25(143),sim50(160),sim75(128),sim100(141) 

■  RX  -  sim10(155),sim25(143),sim50(160),sim75(128),sim100(141) 


False  Alarm  Rate  -  FA/m 


Figure  3.8  Overall  performance  of  the  multiscale  algorithm  for  all  SNRs 


10  10" 
False  Alarm  Rate  -  FA/rrf 


Figure  3.9  Performance  of  the  multiscale  algorithm  for  SNR  =  2 


Figure  3.10  Performance  of  the  multiscale  algorithm  for  SNR  =  5 
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Figure  3.11  Performance  of  the  multiscale  algorithm  for  SNR  =  1C 


False  Alarm  Rate  -  FA/m 


Figure  3.12  Performance  of  the  multiscale  algorithm  for  SNR  =  15 


Figure  3.13  Performance  of  the  multiscale  algorithm  for  a  typical  mineheld  dataset 
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For  the  synthetic  imagery,  the  multiscale  algorithm  outperforms  RX  when  con¬ 
sidering  all  targets  of  various  SNRs.  The  detection  performance  for  targets  with  high 
SNRs  is  similar  to  the  RX  algorithm,  as  expected.  The  clearest  distinction  in  per¬ 
formance  is  at  low  SNRs  and  low  false  alarm  rates  where  the  multiscale  algorithm 
clearly  gives  better  detection  results.  For  the  LAMD  sensor  data,  the  multiscale 
algorithm  and  RX  performance  are  similar.  At  the  desired  false  alarm  rate  of  10“^ 
false  alarms  per  meter^  (approximately  one  false  alarm  per  four  frames),  the  mul¬ 
tiscale  algorithm  offers  a  slight  improvement  in  detection  performance  over  the  RX 
algorithm;  however,  the  LAMD  data  evaluated  in  this  study  clearly  has  a  high  SNR 
(as  seen  in  Figure  3.3)  and  hence  the  performance  of  the  multiscale  algorithm  and 
RX  is  very  similar. 

3.3.  NEURAL  NETWORK  DETECTION  ALGORITHM 

Developed  here  is  an  anomaly  detection  algorithm  based  on  stationary  wavelet 
coefficients  and  a  multilayer  perceptron  feed-forward  neural  network  (MLPNN).  The 
goal  of  investigating  this  architecture  was  to  observe  whether  a  simple  learning  algo¬ 
rithm  could  adapt  to  wavelet  coefficient  data  as  its  input  and  provide  a  probabilistic 
measure  of  the  presence  or  absence  of  an  anomaly  at  its  output.  The  successfulness 
of  the  network  in  detecting  known  anomalies  via  wavelet  coefficient  data  provides 
insight  into  the  feasibility  of  performing  wavelet-based  detection  with  other  archi¬ 
tectures.  The  network  presented  below  is  not  to  be  confused  with  the  wavelet  neural 
networks  that  have  been  proposed  in  the  literature.  These  networks  use  wavelets  as 
their  basis  functions  and  are  particularly  suited  for  prediction  of  time-series  [32]. 

3.3.1.  Test  Imagery  and  Wavelet  Selection.  The  synthetic  images 
used  in  this  investigation  are  simplistic  in  nature  so  as  to  analyze  baseline  results.  A 
representative  synthetic  image  is  shown  in  Figure  3.14(a).  The  image  has  a  dynamic 
range  of  [0,255],  and  contains  white  Gaussian  noise  with  jj,  =  128  and  =  25. 
Targets  are  placed  in  the  image  randomly  according  to  a  uniform  distribution,  while 
making  sure  they  do  not  fall  within  a  certain  proximity  to  another  target  or  on  the 
edge  of  the  image.  The  targets  have  a  radius  of  one  pixel,  and  are  subjected  to 
random  sub-pixel  shifts.  Targets  also  take  on  random  amplitudes  which  fall  both 
above  and  below  the  level  of  the  Gaussian  noise.  The  images  are  similar  to  those 
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tested  with  the  multiscale  RX-based  algorithm,  but  here  the  uoise  is  uot  correlated. 
Wheu  au  image  is  created,  its  grouud  truth  is  recorded  aud  from  this  a  desired 
output  image  is  coustructed.  The  iuput  imagery  are  of  size  128  x  128,  aud  the 
correspoudiug  desired  output  imagery  is  64  x  64.  The  desired  output  values  are 
either  oue,  represeutiug  the  preseuce  of  a  target,  or  zero,  depictiug  its  abseuce. 
The  desired  output  image  for  the  syuthetic  image  iu  Figure  3.14(a)  is  showu  iu 
Figure  3.14(b). 


(a)  Synthetic  Image  (b)  Desired  Output 

Figure  3.14  A  test  image  aud  the  desired  output  created  from  its  grouud  truth 


As  iu  the  multiscale  RX-based  algorithm,  a  Haar  wavelet  is  utilized  for  decom- 
posiug  the  syuthetic  imagery  iuto  its  wavelet  coefficieuts.  Agaiu,  a  shift-iuvariaut 
wavelet  trausform  is  used  to  obtaiu  the  wavelet  coefficieuts  which  are  used  as  iuputs 
to  the  uetwork.  Showu  iu  Figure  3.15  is  a  test  image  aud  its  2-level  uudecimated 
wavelet  decompositiou. 

3.3.2.  Training  Data  Selection.  The  training  data  set  is  chosen  from 
a  primary  set  of  candidate  data.  These  candidate  data  are  generated  from  200 
synthetic  images  with  the  same  parameters  of  the  image  in  Figure  3.14,  except 
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(a)  Test  Image  (b)  LL2  (c)  HL2  (d)  LH2 


(e)  HH2  (f)  HLi  (g)  LHi  (h)  HH^ 

Figure  3.15  A  test  image  and  its  stationary  wavelet  transform  coefficients 


that  each  image  contains  approximately  25  targets.  After  obtaining  the  wavelet 
decomposition  of  each  image,  the  resulting  decomposition  subbands  are  split  into 
4x4  subimages.  Thus  there  are  4x4x7  =  112  inputs  into  the  neural  network, 
representing  the  vertical,  horizontal,  and  diagonal  details  of  both  levels,  and  the 
approximation  coefficients  of  the  second  level.  An  example  of  a  4  x  4  block  of  the 
original  image  which  contains  a  target  (a),  its  wavelet  coefficients  (b)  -  (h),  and  its 
desired  output  (i)  are  shown  in  Figure  3.16. 

There  are  approximately  5000  (200  images  x  25  targets)  sets  of  wavelet  coeffi¬ 
cients  representing  4x4  blocks  that  contain  a  target  in  the  original  imagery.  All  of 
these  5000  sets  are  chosen  as  training  data.  Also  chosen  are  15,000  sets  of  wavelet 
coefficients  in  the  candidate  imagery  that  represent  the  absence  of  a  target.  This 
results  in  a  training  set  with  25%  of  the  inputs  (wavelet  coefficients)  representative 
of  targets  in  the  original  image,  and  75%  without.  This  3:1  ratio  of  inputs  represent¬ 
ing  “non-targets”  to  inputs  representing  targets  is  chosen  since  a  much  greater  area 
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(a)  (b)  (c)  (d)  (e)  (f)  (g)  (h)  (i) 


Figure  3.16 


4x4  block  of  the  original  image  with  target,  4x4  blocks  of  wavelet 
transform  coefficients,  and  2x2  desired  output 


of  the  synthetic  image  used  for  testing  the  network  is  populated  with  low  variance 
noise,  not  target  signatures.  A  1:1  ratio  between  the  target  and  non-target  data 
was  found  to  contribute  to  a  higher  false  alarm  rate  at  the  network  output,  likely 
because  the  neural  network  tended  to  overtrain  on  the  data  containing  targets. 

The  input  training  data  and  desired  output  need  to  be  arranged  in  a  vector. 
The  desired  output  data  would  be  arranged  as  [1  0  0  0]^  for  the  example  shown  in 
Figure  3.16(i).  For  the  input,  let  a  4  x  4  block  of  coefficients  in  a  certain  subband,  say 
the  approximation,  be  represented  as  ,  where  the  block  of  coefficients  have  been 
rearranged  into  a  16  x  1  column  vector.  The  input  data  are  preprocessed  to  have 
zero  mean,  and  are  normalized  by  their  standard  deviation.  Call  this  resulting  data 
C ll2-  The  preprocessed  input  data  (of  length  112)  are  presented  to  the  network  in 
the  following  form: 

X  =  [C lHi  C lh2  C hLi  C hl2  C hhi  C'hh2  C'll2]^  ■  (3.32) 

3.3.3.  Network  Architecture  and  Training.  The  neural  network  used 
here  has  a  feedforward  multilayer  architecture  and  is  trained  by  the  vector-matrix 
formulation  of  the  gradient-descent  backpropagation  algorithm  [33].  There  are  112 
inputs  to  the  network,  60  hidden  layer  neurons,  and  4  output  neurons.  A  network 
diagram  is  shown  in  Figure  3.17. 

In  this  network,  a  hyperbolic  tangent  sigmoid  transfer  function  is  chosen  for 
the  hidden  layer  neurons.  This  transfer  function  captures  a  signihcant  range  of 
the  preprocessed  input  data  in  its  linear  region,  and  saturates  the  outliers.  Its 
range  is  [—1,1].  A  logarithmic  sigmoid  transfer  function  is  chosen  for  the  output 
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Figure  3.17  Neural  network  architecture  -  112  inputs,  60  hidden  layer  neurons,  4 
output  neurons 


layer  neurons.  This  function  allows  the  output  data  to  take  on  a  probabilistic 
representation,  since  its  range  is  [0,1].  Also,  the  desired  output  used  to  train  the 
network  contains  only  values  of  either  0  or  1,  so  this  function  is  more  appropriate 
than  a  tangent  sigmoid.  Mathematically  these  transfer  functions  are  expressed  as 

/i(^)  =  =  (3.33) 

The  approximately  20,000  inputs  and  corresponding  desired  outputs  are  hrst 
randomized  before  being  presented  to  the  network.  One  hundred  epochs  of  training 
with  a  learning  rate  of  0.01  gives  the  mean-squared-error  (MSE)  performance  curve 
shown  in  Figure  3.18.  The  MSE  converges  to  only  a  third  of  the  initial  value  at  the 
hrst  epoch.  Although  this  would  seem  to  characterize  bad  performance,  the  error 
remains  high  because  of  the  sparsity  of  the  desired  output  data.  Most  of  the  desired 
output  data  are  zero,  and  only  a  few  have  a  value  of  one.  The  block  diagram  in 
Figure  3.19  depicts  the  training  methodology. 

3.3.4.  Network  Testing  and  Performance  Evaluation.  Once  the 
network  has  been  trained,  its  performance  needs  to  be  characterized.  For  network 
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Figure  3.18  Training  epochs  vs.  MSE  performance 


Figure  3.19  Training  paradigm 


testing,  the  synthetic  images  are  generated  similarly  to  the  image  shown  in  Fig¬ 
ure  3.14(a),  each  populated  with  15  targets.  After  decomposing  an  image  with  the 
wavelet  transform,  a  4  x  4  moving  window  is  utilized  to  gather  data  over  all  of  the 
wavelet  subbands  to  form  the  input  to  the  network.  Using  a  moving  window  gives 
more  accurate  output  data  as  compared  to  simply  splitting  the  subbands  up  into 
one  set  of  4  x  4  blocks.  Target  signatures  can  fall  at  the  border  of  any  given  block 
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of  subband  data,  so  a  moving  window  approach  minimizes  the  chance  that  a  target 
is  missed  by  the  network. 

Unfortunately,  this  method  makes  the  presentation  of  the  inputs  to  the  net¬ 
work,  along  with  the  reconstruction  of  the  output  data,  more  computationally  ex¬ 
pensive  than  using  a  single  set  of  4  x  4  blocks  from  the  subbands.  For  any  2x2 
block  of  output  data,  the  result  is  upsampled  and  appropriately  translated  to  match 
the  corresponding  location  in  the  input  imagery.  These  output  responses  are  then 
summed,  resulting  in  much  better  detection  performance  when  compared  to  not 
using  a  moving  window  approach. 

After  the  response  has  been  computed  for  a  particular  image,  a  probabilistic 
“likelihood”  dataset  now  exists.  The  resulting  image  from  the  network  is  then 
subjected  to  non-maximal  suppression.  This  results  in  a  detection  list  with  probable 
target  coordinates  and  likelihood  values.  A  test  image,  its  neural  network  output, 
and  the  results  from  non-maximal  suppression  are  shown  in  Figure  3.20.  A  simple 
block  diagram  of  the  network  testing  methodology  is  shown  in  Figure  3.21. 


(a)  Test  Image  (b)  Nerual  Network  Out-  (c)  Non-maximal  Supres- 

put  sion 

Figure  3.20  Detection  results  for  a  test  image 


The  detection  list  resulting  from  non-maximal  suppression  is  compared  against 
the  ground  truth  generated  with  the  original  image,  and  the  receiver  operating 
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characteristic  (ROC)  is  computed.  The  ROC  can  be  used  to  measure  the  detection 
and  clustering  performance  of  the  network.  ROC  curves  were  computed  for  three 
datasets,  each  containing  50  images,  but  with  a  different  noise  variance  per  dataset. 
These  curves  are  shown  in  Figure  3.22. 


Figure  3.21  Network  testing  and  performance  evaluation 


As  can  be  seen  from  the  ROC  curves,  the  detection  performance  of  the  network 
for  the  given  test  imagery  is  fairly  good,  with  a  relatively  low  probability  of  false 
alarm  vs.  probability  of  detection.  The  performance  of  the  network  is  degraded  as 
the  noise  variance  increases,  as  expected. 
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Figure  3.22  ROC  curves  for  datasets  with  =  25,  cr^  =  50,  and  cr^  =  100 
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4.  IMAGE  REGISTRATION  ALGORITHMS 

4.1.  OVERVIEW  OF  IMAGE  REGISTRATION 

Automatic  image  registration  is  the  process  establishing  point-by-point  corre¬ 
spondence  between  two  images  of  a  scene,  with  little  or  no  human  interaction.  This 
is  a  necessary  step  for  visualizing  a  single  scene  composed  of  many  images  as  one 
image  mosaic.  The  basic  details  of  image  registration  are  presented.  Also  outlined 
here  are  a  baseline  image  registration  algorithm  utilizing  spatial  correlations  be¬ 
tween  two  images,  and  the  results  of  a  scale-space  method,  called  the  shift-invariant 
feature  transform  (SIFT).  These  algorithms  and  their  results  are  discussed  in  detail. 

4.1.1.  Terminology.  When  registering  two  images  together,  one  image  is 
called  the  reference  image,  and  the  other  the  sensed  image.  The  point-to-point  cor¬ 
respondences  found  between  the  two  scenes  are  called  control  points.  The  reference 
image  remains  stationary,  while  the  sensed  image  is  subjected  to  some  transfor¬ 
mation  or  warping  which  puts  it  into  the  same  coordinate  system  as  the  reference 
image.  When  registering  multiple  images,  after  initial  registration  of  the  hrst  two 
images,  the  previous  sensed  image  becomes  the  reference  image  and  a  new  sensed 
image  is  registered  with  this  reference  image,  and  so  on.  There  are  essentially  hve 
steps  in  performing  image  registration  regardless  of  which  techniqne  may  be  em¬ 
ployed  at  a  certain  step  of  a  registration  algorithm.  The  steps  of  basic  antomated 
image  registration  are  listed  below. 

1.  Preprocessing:  Image  smoothing,  deblurring,  removal  of  local 
mean,  edge  detection,  etc. 

2.  Feature  selection:  Extraction  of  points,  lines,  regions,  templates, 
etc.  from  an  image. 

3.  Feature  correspondance:  Determination  of  the  correspondence 
between  featnres  in  two  images  and  selection  of  control  points. 

4.  Transformation  function  estimation:  A  linear-conformal,  affine, 
projective  or  possibly  higher  order  (polynomial)  transformation  de¬ 
termined  from  the  coordinates  of  the  control  points. 
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5.  Resampling:  Interpolation  and  resampling  of  a  sensed  image  to 
the  coordinate  system  of  a  reference  image  given  the  transformation 
function. 

Studied  in  this  thesis  are  the  first  three  steps  of  the  registration  process.  The  last 
steps  -  transformation  function  estimation  and  resampling  -  are  not  trivial,  but 
remain  the  same  for  most  registration  algorithms. 

4.1.2.  Types  of  Transformations.  The  most  commonly  used  trans¬ 
formation  functions  used  in  the  baseline  and  subsequent  algorithms  are  the  linear- 
conformal  and  affine  transformations.  A  linear-conformal  mapping  is  a  subset  of  the 
affine  mappings.  It  is  composed  of  rotation,  scaling,  and  translation.  The  general 
affine  transformation  is  composed  of  theses  three  operations  plus  a  shear  operation. 
Only  two  control  point  pairs  are  needed  to  dehne  a  linear-conformal  transformation, 
and  this  is  the  mapping  that  is  used  when  the  automatic  registration  algorithm 
only  hnds  two  control  point  pairs.  Three  control  point  pairs  are  needed  to  dehne 
a  general  affine  transformation,  and  this  type  of  transformation  is  used  when  three 
or  more  control  points  are  found.  The  following  discussion  of  transformation  types 
follows  the  notation  in  [34].  Shown  in  Figure  4.1  are  an  image  (a)  and  its  mapping 
under  a  linear-conformal  (b)  and  affine  (c)  transformation. 


(a) 


(b) 


(c) 


Figure  4.1  An  example  image  and  the  image  under  a  linear-conformal  and  affine 
tranformation 
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4. 1.2.1  Linear-conformal  mappings.  The  linear-conformal  mapping  is 
performed  by  composing  a  rotation  R,  scaling  S,  and  translation  (or  displacement) 
D,  and  can  be  denoted  mathematically  as 

Pj  =  Dxo,?;o^sP't'Pi)  (4-1) 

where  P)  and  Pj  are  the  jth  pair  of  coordinates  in  the  sensed  and  reference  images 
respectively.  This  is  expressed  explicitly  as 
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or  equivalently 


x  =  xs  cos  9  —  ys  sin  9  +  xq 
y  =  xs  sin  9  +  ys  cos  9  +  yo, 


where  four  parameters  dehne  the  transformation:  xq  and  yo  are  the  displacements 
in  the  x  and  y  directions,  s  the  scaling  parameter,  and  9  the  angle  of  rotation.  The 
angle  9  is  independent  of  the  other  parameters  and  is  calculated  from  the  two  control 
point  pairs  with 

9  =  92  —  9i  =  arctan  (  ^ ^  ]  —  arctan  (  — — ^  )  .  (4.3) 

\X2-X2J  \X2-X1J 


Once  9  is  known,  the  other  three  parameters  can  be  solved  with  the  system  of  three 
equations  in  Eq.  4.2. 

4. 1.2. 2  General  afRne  mappings.  The  general  affine  transformation  in¬ 
cludes  the  component  of  shear.  This  transformation  has  six  parameters  which  can  be 
computed  from  at  least  three  non-colinear  control  point  pairs.  The  transformation 
matrix  is  given  by 
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(4.4) 
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where  {x,y)  and  {x',y')  are  the  coordinates  before  and  after  the  transformation.  If 
only  a  few  control  point  pairs  are  found  with  the  registration  algorithm,  any  error 
in  the  coordinates  usually  causes  inaccurate  registration.  When  many  control  point 
pairs  are  provided  or  found  by  an  automatic  registration  algorithm,  they  can  be 
used  to  determine  a  least-squares  estimate  of  the  six  parameters  in  Eq.  4.4.  An 
error  criteria  function  can  be  dehned  as 

n 

e(aii,  ai2,  ois,  0-21,  (I22,  ^23)  =  +  (^i2yj  +  013  ~  x'jY 

i=i 

+  {(^2lXj  +  (l22yj  +  ^23  ~ 

To  minimize  the  error  each  in  parameter,  the  six  partial  derivates  de/daij  are  evalu¬ 
ated  and  the  expressions  are  set  equal  to  zero  which  yields  a  system  of  six  equations 
represented  in  the  following  matrix 
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(4.5) 

where  the  summations  are  over  j  from  1  to  n.  n  is  the  number  of  control  point 
pairs  provided  to  estimate  the  transformation  parameters.  Solving  the  above  matrix 
equation  for  an, ... ,  023  gives  the  least-squares  estimate  of  the  parameters. 

4.1.3.  Registration  Example.  Shown  in  Figure  4.2  is  an  example  of  reg¬ 
istration  performed  on  typical  LAMD  imagery.  Four  frames  have  been  selected  and 
registered  together  using  general  affine  transformations  with  the  correlation-based 
technique  for  control  point  selection.  The  frames  shown  in  the  hgure  have  a  large 
amount  of  overlap  and  have  been  stitched  together  with  little  error.  Correlations 
were  performed  using  subimages  of  size  32  x  32  taken  from  the  reference  images. 


51 


Figure  4.2  Four  LAMD  frames  registered  with  the  correlation-based  method  and 
affine  transformations 
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4.2.  CORRELATION-BASED  METHOD 

The  current  baseline  registration  algorithm  used  to  register  a  set  of  image 
frames  is  based  on  a  correlation  technique.  Given  two  images,  a  subimage  of  the 
reference  image  is  selected  by  some  desired  criterion,  and  then  a  correlation  of  the 
subimage  is  performed  with  the  sensed  image.  If  the  maximum  value  of  correlation 
at  some  location  in  the  sensed  image  exceeds  a  threshold,  the  corresponding  points 
are  recorded  and  used  to  estimate  the  transformation  function. 

4.2.1.  Reference  Image  Feature  Selection.  In  this  algorithm,  the 
feature  extracted  from  the  reference  image  is  a  small  region  of  the  image,  possibly 
with  a  size  of  32  x  32  or  64  x  64  pixels.  Smaller  regions,  such  as  16  x  16  pixels, 
tend  to  contribute  to  higher  numbers  of  inaccurate  correlations,  which  can  cause 
registration  errors.  Larger  regions  can  cause  problems  when  there  is  a  small  amount 
of  overlap  between  the  referenced  and  sensed  images.  A  larger  region  may  also  result 
in  poor  correlation  if  rotation  or  warping  is  signihcant.  If  the  region  is  too  large, 
it  is  possible  that  no  maximum  correlation  value  will  reach  the  minimum  threshold 
and  the  number  of  control  points  the  algorithm  is  able  to  hnd  might  be  small. 

These  subimages  are  selected  based  a  maximum  energy  criterion.  Over  any 
given  M  X  N  region  of  the  image,  the  energy  is  calculated.  Here  M  and  N  are 
the  dimensions  of  the  subimages  used  for  calculating  the  correlations.  A  flat  mask 
h{x,  y)  of  size  M  by  A  with  values  1/MN  is  convolved  with  the  squared  pixel  values 
of  the  reference  image,  resulting  in  a  local  mean-squared  image.  Let  the  reference 
image  be  denoted  by  I{x,y).  Then  the  local  mean-squared  image  is 

Ims{x,  y)  =  y)}h{u,  u)},  (4.6) 

where  the  spatial  convolution  has  been  performed  in  the  frequency  domain  and 
h{u,  v)  is  the  Fourier  transform  of  the  mask  h{x,  y).  A  local  mean  image  Im{x,  y)  is 
then  computed  with 


Im{x,y)^T  '^{T{Ir{x,y)]h{n,v)]. 


(4.7) 
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The  local  energy  at  each  location  in  the  image  can  then  be  computed  in  the  root- 
mean-square  (RMS)  sense,  i.e. 


The  RMS  image  is  then  subjected  to  non-maximal  suppression,  and  a  list  of  high- 
energy  locations  is  created.  The  list  can  be  sorted  from  the  highest  energy  locations 
descending  or  it  can  be  randomized.  The  current  baseline  algorithm  randomizes  the 
locations  to  be  chosen  as  features,  since  simply  choosing  the  highest-energy  locations 
might  result  in  clustering  of  features  around  a  single  high-energy  locality  in  the 
image,  such  as  a  single  bush  or  tree  in  a  highly  correlated  background.  Randomizing 
the  list  helps  to  spread  the  chosen  features  more  uniformly  across  the  image,  which 
results  in  more  accurate  estimation  of  the  transformation  function  once  the  control 
points  have  been  computed. 

4.2.2.  Correlation  Calculations.  Once  the  list  of  feature  locations  from 
the  reference  image  has  been  created,  the  subimages  surrounding  those  locations  are 
chosen  and  then  a  correlation  calculation  is  made  with  the  sensed  image.  The  two- 
dimensional  correlation  coefficient  r  for  two  matrices  A  and  B  both  of  size  M  x  N 
is  given  by 


y^(Amn  -  A)(Rmn  “  B) 

m  n 

m  n  /  \  m  n  / 


where  A  and  B  are  the  means  of  the  matrices  A  and  B.  It  is  quite  expensive  to 
compute  this  correlation  coefficient  at  every  pixel  location  as  the  reference  subimage 
moves  across  the  sensed  image,  so  the  computation  is  performed  in  the  frequency 
domain. 

First,  the  RMS  values  y)  of  the  sensed  image  /'(x,  y)  are  obtained  in  the 

same  manner  as  the  reference  image,  using  the  same  flat  mask  h{x,y).  The  Fourier 
transform  of  the  sensed  image  is  also  obtained,  denoted  by  I'{u,v).  Then  the  mean 
of  the  hrst  subimage  chosen  from  the  reference  image  feature  list  is  subtracted  and 
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then  this  result  is  normalized  to  have  unit  energy.  Let  this  preprocessed  “chip” 
be  denoted  g{x,y).  This  terminology  is  borrowed  from  the  impulse  response  of 
the  matched  hlter  in  a  digital  communications  system.  The  following  correlation 
calculation  is  performed  in  the  frequency  domain 

fi{u,v)  =  I\u,v)g*{u,v),  (4.10) 

where  *  denotes  the  complex  conjugate.  The  hnal  correlation  coefficients  of  the  chip 
and  sensed  image  are  given  by 

^  _ _ n{x,y) _  ^ 

VMNI'rra.(x,V)  VMN 

If  the  maximum  of  the  correlation  values  r{x,y)  exceeds  a  threshold,  then  the  lo¬ 
cation  of  the  peak  is  found  and  the  coordinates  of  the  subimage  taken  from  the 
reference  image  and  the  location  of  the  correlation  peak  in  the  sensed  image  are 
put  into  the  list  of  control  point  pairs.  This  process  is  then  repeated  for  the  next 
subimage  in  the  reference  image  feature  list. 

4.3.  SCALE-INVARIANT  FEATURE  TRANSFORM 

The  scale-invariant  feature  transform  (SIFT)  is  a  novel  algorithm  developed 
by  David  Lowe  [20]  which  is  used  to  extract  invariant  features  that  can  be  used  to 
reliably  match  different  views  of  an  object  or  scene.  The  features  extracted  from  a 
given  image  are  relatively  invariant  to  signihcant  amounts  of  scaling  and  affine  warp. 
Presented  in  this  thesis  is  how  SIFT  features  can  be  used  to  aid  in  image  registration. 
The  algorithm  is  able  to  successfully  hnd  control  points  without  resorting  to  a 
correlation  based  method  by  matching  features  in  a  high-dimensional  feature-space. 
The  current  implementation  of  the  SIFT  algorithm  uses  a  brute-force  search  to  hnd 
the  minimum  distance  between  reference  image  features  and  sensed  image  features, 
which  is  computationally  expensive.  Other  matching  algorithms  such  as  the  Best- 
Bin-First  (BBF)  algorithm  [35]  can  be  used  in  future  implementations  to  search 
through  the  feature  space  relatively  quickly  as  compared  to  brute-force  algorithms. 

4.3.1.  Overview  of  SIFT.  The  SIFT  algorithm  can  efficiently  extract  many 
features  over  many  scales  and  location  of  a  single  image.  A  typical  image  of  size 
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500  X  500  gives  rise  to  approximately  2000  stable  features.  There  are  five  important 
steps  to  extracting  these  invariant  features  from  an  image.  These  steps  are  listed 
below: 

1.  Scale-space  construction:  A  scale  space  is  constructed  for  the 
given  image  by  repeatedly  convolving  the  image  with  a  Guassian 
kernel.  Each  plane  of  the  scale  space  is  then  subtracted  from  its 
adjacent  plane  to  obtain  a  difference-of-Gaussian  image.  The  set  of 
difference-of-Gaussian  images  is  used  to  perform  extrema  detection. 

2.  Scale-space  extrema  detection:  A  search  over  all  possible  scales 
and  locations  in  the  difference-of-Gaussian  images  is  performed  to 
hnd  candidate  points  which  are  invariant  to  scale  and  orientation. 

3.  Key  point  localization:  Interpolation  is  performed  to  accurately 
£t  a  candidate  feature  location  (keypoint)  in  location  and  scale. 
Keypoints  are  then  refined  based  on  measures  of  their  stability. 

4.  Orientation  assignment:  One  or  more  orientations  are  assigned 
to  each  keypoint  based  on  the  local  image  gradient  directions  at 
the  keypoint ’s  scale  in  scale  space.  Any  further  operations  are  per¬ 
formed  relative  to  the  this  orientation  to  provide  invariance  to  any 
affine  transformation. 

5.  Keypoint  descriptor  construction:  At  the  selected  scale  and 
location  of  each  keypoint,  the  surrounding  image  gradients  are  used 
to  construct  a  keypoint  descriptor.  The  descriptor  is  essentially  the 
location  of  each  keypoint  in  the  feature  space. 

The  keypoint  descriptors  are  very  distinctive,  allowing  a  single  feature  in  the  ref¬ 
erence  image  to  be  matched  to  its  corresponding  feature  in  the  sensed  image  with 
high  probability.  The  figures  presented  in  the  following  discussion  have  been  adapted 
from  the  original  paper  by  Lowe  [20]. 

4.3.2.  Scale-space  Construction.  The  first  step  of  the  SIFT  algorithm 
is  the  construction  of  scale  space  for  a  given  image.  The  scale  space  is  defined  as  a 
fnnction  L{x,y,  a)  that  is  produced  from  a  convolution  of  a  variable-scale  Gaussian 
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function  G{x,y,a)  with  an  image  I{x,y).  Specifically, 

L{x,y,a)  =  G{x,y,(T)  *  I{x,y),  (4.12) 

where 

G{x,y,a)  =  exp 

Keypoints  are  located  not  in  this  scale  space,  but  rather  in  the  difference-of-Gaussian 
images  that  result  from  the  difference  of  adjacent  planes  in  scale  space  separated  by 
a  multiplicative  constant  h. 

Dix,y,a)  =  [G{x,y,ka)  -  G{x,y,a)]*  I{x,y) 

=  L{x,y,ka)  -  L{x,y,a). 


-{x^  +  y  ) 

2a2 


(4.13) 


The  reasoning  behind  looking  for  keypoints  in  these  difference-of-Gaussian  images  is 
twofold.  First,  D  is  relatively  easy  to  compute  with  a  point-by-point  subtraction  of 
the  two  planes  of  scale  space.  Second,  the  difference-of-Gaussian  function  provides 
a  close  approximation  to  the  scale-normalized  Laplacian-of-Gaussian  It  has 

been  found  that  the  extrema  of  produce  the  most  stable  image  features 

as  compared  to  other  possible  operations,  such  as  the  gradient  and  Harris  corner 
function  [36].  In  [20]  Lowe  describes  the  relationship  between  D  and  a^V^G  via  the 
heat  diffusion  equation 


^=aV^G,  (4.14) 

here  parameterized  in  terms  of  a  rather  than  t  =  An  approximation  of  nV^G  is 
provided  from  the  hnite  difference  approximation  of  dG/da  using  the  difference  of 
adjacent  scales  ka  and  a: 


aV^G 


dG  G{x,y,ka)  —  G{x,y,a) 

da  ka  —  a 


(4.15) 


or  equivalently 


G{x,  y,  ka)  —  G{x,  y,  a)  ~  (/c  —  Ija^V^G. 


(4.16) 
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Thus  the  difference-of- Gaussian  D  is  an  approximation  of  the  Laplacian-of-Gaussian 
multiplied  by  the  constant  factor  k  —  1.  The  approximation  error  approaches 
zero  as  /c  — 1,  but  the  error  has  little  impact  on  extrema  detection  even  when  there 
are  large  differences  in  scale. 

The  construction  of  the  scale  space  L{x,  y,  a)  and  difference-of-Gaussian  images 
D{x,y,a)  is  depicted  in  Figure  4.3.  The  original  image  I{x,y)  is  first  expanded 
via  bilinear  interpolation,  doubling  its  dimensions  and  quadrupling  its  area.  This 
image  expansion  yields  more  keypoints  from  the  SIFT  algorithm.  This  base  image 
is  then  incrementally  convolved  with  Gaussian  kernels  G{x,y,k^a)  at  a  scale  /c*, 
where  0  <  i  <  s  +  2,  for  integer  i.  s  is  the  number  of  intervals  that  an  octave, 
which  represents  a  doubling  of  a,  is  divided  into,  k  is  chosen  such  that  k  =  2^/®. 
For  each  octave,  s  +  3  blurred  images  are  created  so  that  the  detection  of  extrema 
covers  the  complete  octave.  Once  a  complete  octave  has  been  created,  the  image  in 
scale  space  which  has  twice  the  initial  value  of  a  is  downsampled  by  a  factor  of  two 
and  this  result  is  used  to  construct  the  next  octave  of  the  scale  space.  The  current 
implementation  of  the  SIFT  algorithm  uses  s  =  3  with  four  octaves,  so  k  =  \f2. 
Shown  in  Figure  4.3  are  two  octaves  with  s  =  2. 

4.3.3.  Reduction  of  Computational  Complexity.  To  compute  the 
scale  space,  the  convolution  operation  is  not  performed  as  explicitly  stated  above. 
One  feature  of  the  two-dimensional  Gaussian  kernel  is  that  it  is  separable.  This  is 
exploited  in  the  current  implementation.  Separability  of  the  kernel  arises  since  it 
can  be  written  as 
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(4.17) 


This  allows  a  one-dimensional  convolution  to  be  performed  along  rows.  This  inter¬ 
mediate  result  is  then  convolved  with  the  one-dimensional  kernel  along  the  columns. 
This  method  dramatically  reduces  the  computational  complexity  required  to  per¬ 
form  the  convolution  from  O(n^)  to  0(2n). 

The  convolution  operation  obeys  the  associative  property  and  the  convolution 
of  two  Gaussian  functions  is  also  a  Gaussian  function.  Using  these  two  facts  allows 
computation  to  be  further  reduced  when  computing  the  scale  space.  Progressively 
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Figure  4.3  Construction  of  scale  space  and  difference-of-Gaussian  images 


larger  Gaussian  kernels  need  not  be  convolved  with  the  base  image  to  obtain  the 
scale  space  images  at  higher  scales,  but  instead  kernels  of  the  same  size  can  be  con¬ 
volved  with  the  image  at  a  lower  scale  adjacent  to  the  scale  being  computed.  These 
operations  help  make  the  SIFT  algorithm  very  fast  since  no  complex  convolutions 
or  FFTs  need  to  be  performed. 

4.3.4.  Scale-space  Extrema  Detection.  Once  the  difference-of-Gaussian 
images  have  been  computed,  the  next  step  of  SIFT  is  to  locate  the  extrema  within 
these  images.  This  is  performed  by  searching  for  either  a  maximum  or  minimum  at 
each  location  and  scale  within  these  images.  Each  sample  point  is  compared  to  its 
eight  neighbors  within  its  scale,  and  its  nine  neighbors  at  both  the  adjacent  higher 
and  lower  scales.  This  is  depicted  in  Figure  4.4,  where  the  sample  point  is  denoted 
with  an  “X”.  If  the  point  is  found  to  be  an  extremum,  its  location  and  scale  are 
recorded.  Since  most  points  will  be  discarded  as  extrema  after  the  hrst  few  checks  of 
its  adjacent  neighbors,  this  search  requires  very  little  computation  and  is  performed 
very  rapidly. 
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Figure  4.4  A  sample  point  compared  to  its  nearest  neighbors  in  both  location  and 
scale 


4.3.5.  Keypoint  Localization  and  Refinement.  After  finding  the  ex¬ 
trema  within  the  difference-of-Gaussian  images,  the  values  surrounding  an  extremum 
are  used  to  accurately  interpolate  its  location  in  both  the  x  and  y  coordinates  and  in 
scale  a.  This  is  performed  by  fitting  a  3D  quadratic  function  to  the  points  surround¬ 
ing  the  extremum.  The  approach  suggested  by  Lowe  uses  a  Taylor  series  expansion 
up  to  the  quadratic  terms  of  the  function  D{x,  y,  a)  where  the  origin  is  at  the  sample 
point: 

,  ,  dD^  1 

where  x  =  {x,y,a)'^  is  the  offset  from  the  sample  point,  dD/dx  is  a  row  vector  of 
derivatives  with  respect  to  x,  y,  and  a,  and  d'^D/d^  is  the  3x3  Hessian  matrix. 
Evaluating  dD(x)/(9x,  setting  the  expression  to  zero  and  solving  gives  the  location 
of  the  extremum  as 


X  =  — 


d'^D 

d's? 


-1 


dD 
(9x  ’ 


(4.19) 


Both  the  derivative  of  D  and  its  Hessian  are  approximated  using  differences  of  neigh¬ 
boring  sample  points.  If  the  value  of  the  offset  is  larger  than  0.5  in  any  direction,  the 
extremum  lies  closer  to  a  different  sample  point.  This  point  is  then  found  and  the 
process  is  repeated  about  this  new  point.  The  interpolated  value  of  the  extremum 
Il(x)  is  used  to  reject  any  unstable  extrema  with  a  low  contrast.  Upon  substituting 
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Eq.  4.19  into  Eq.  4.18,  the  interpolated  value  becomes 

B(x)=fl+~^x.  (4,20) 

The  current  SIFT  implementation  discards  any  extrema  where  |hl(x)|  <  0.04,  where 
the  input  image  was  hrst  normalized  to  range  from  [0, 1]. 

Once  the  low  contrast  extrema  have  been  discarded,  any  keypoints  arising 
from  edges  in  the  difference-of-Gaussian  function  need  to  be  eliminated  as  well.  The 
difference-of-Gaussian  function  has  a  strong  response  along  any  edges  in  the  image 
and  these  are  unstable  to  small  amounts  of  noise.  To  determine  any  poorly  dehned 
peaks  in  the  difference-of-Gaussian  images,  the  principal  curvatures  are  examined.  A 
poorly  defined  peak  will  have  a  large  principal  curvature  along  the  edge,  but  a  small 
curvature  in  the  perpendicular  direction.  The  principal  curvatures  are  computed 
with  the  Hessian  matrix  H  at  the  location  and  scale  of  the  keypoint,  where 

(4.21) 

The  eigenvalues  of  the  Hessian  H  are  proportional  to  the  principal  curvatures  of  D. 
Instead  of  explicitly  computing  the  eigenvalues  of  H,  the  ratio  of  principal  curvatures 
can  be  computed  by  evaluating  only  the  ratio  of  the  eigenvalues.  This  approach  is 
used  by  Lowe  and  is  attributed  to  Harris  and  Stephens  [37].  Suppose  a  and  f3  are 
the  larger  and  smaller  eigenvalues  respectively.  Then  the  sum  and  product  of  the 
eigenvalues  can  be  computed  with  the  trace  and  the  determinant  of  H; 


Tr(H)  —  Dxx  +  Dyy  —  a  -\-  j3, 

(4.22) 

det(H)  =  DxxDyy  -  {Dxyf  =  a(3. 

(4.23) 

If  the  determinant  is  negative,  then  the  point  is  discarded  as  being  an  extremum. 
Let  r  be  the  ratio  of  the  principal  curvatures  (ratio  of  the  two  eigenvalues)  so  that 

ry  =  rR  thpTl 

Tr(H)2  _  (a  +  _  {r(3  +  _  (r  +  l)^ 

det(H)  af3  r(3‘^ 


H 


Dxx  Dxy 
^xy  ^yy 


r 


(4.24) 
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So  for  a  given  ratio  r,  if  the  inequality 


Tr(H)2  (r  +  1)^ 
det(H)  ^  r 


(4.25) 


is  not  satisfied,  the  point  is  discarded  as  being  a  poorly  dehned  peak.  The  current 
implementation  eliminates  points  where  r  >  8. 

4.3.6.  Orientation  Assignment.  To  achieve  invariance  to  any  possi¬ 
ble  rotation  of  a  feature,  an  orientation  is  assigned  to  all  candidate  points  which 
have  survived  the  process  of  localization  and  rehnement.  Rather  than  working 
with  difference-of-Gaussian  images,  further  calculations  are  all  performed  with  the 
Gaussian-smoothed  images  L{x,y,a).  Prior  to  computing  the  orientation  for  a  sin¬ 
gle  keypoint,  the  gradient  magnitudes  m{x,y,a)  and  orientations  9{x,y,(j)  of  all 
points  in  L  are  precomputed  by  using  pixel  differences  with 


m{x,y,a) 


\J[L{x  +  l,y,(T)  -L{x 


1,  y,  +  [L{x,  y  +  l,a)-  L{x,  y 


(4.26) 


and 

L{x,y  +  I,  a)  -  L{x,y-  I,  a) 

L{x  -F  l,y,a)  -  L{x  -  l,y,a) 

Once  the  gradient  magnitudes  and  orientations  have  been  computed,  an  orientation 
histogram  is  formed  by  using  an  area  of  pixel  values  (of  magnitude  and  orientation) 
surrounding  the  location  at  the  nearest  scale  to  that  of  the  keypoint.  Gurrently, 
a  15  X  15  region  around  the  keypoint  location  is  used  to  construct  the  orientation 
histogram.  The  magnitudes  in  this  region  are  weighted  by  a  circular  gaussian  kernel 
with  a  standard  deviation  1.5  times  that  of  the  scale  a.  This  keeps  strong  gradient 
magnitudes  which  might  lie  several  pixels  away  from  the  keypoint  location  from 
contributing  to  an  incorrect  orientation.  Each  magnitude  is  then  placed  into  the 
histogram  bin  corresponding  to  the  gradient  orientation.  The  histogram  has  36  bins 
each  representing  10°  to  cover  the  360°  range  of  orientations.  The  maximum  peak 
of  the  histogram  is  found,  and  then  a  parabola  is  fit  to  it  and  the  points  on  each 
side  of  the  peak  to  better  interpolate  the  location  of  the  maximum.  From  this  peak, 
an  orientation  is  assigned  to  the  keypoint.  If  another  peak  in  the  histogram  has  a 
height  80%  that  of  the  maximum,  the  interpolation  process  is  repeated  and  another 


9{x,  y,  a)  =  arctan 
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keypoint  is  created  with  this  orientation.  As  stated  by  Lowe,  and  from  our  own 
observations,  approximately  15%  of  keypoints  are  assigned  two  orientations.  These 
additional  orientations  aid  in  increasing  the  stability  of  matching. 

4.3.7.  Keypoint  Descriptor  Construction.  The  keypoint  descriptor 
is  essentially  a  point  in  a  128-dimensional  feature  space  which  can  be  matched 
with  high  probability  against  another  descriptor  representing  the  same  feature  in  a 
different  image.  The  descriptor  is  constructed  by  using  a  method  similar  to  that  used 
to  find  the  keypoint  orientation.  A  16  x  16  region  of  the  magnitudes  and  orientations 
around  the  keypoint  and  nearest  its  scale  is  extracted  and  the  magnitudes  subjected 
to  a  Gaussian  window  with  standard  deviation  half  that  of  the  window  size  (<7  =  8). 
Again,  the  Gaussian  is  used  to  attenuate  any  large  gradient  magnitudes  which  may 
he  several  pixels  away  from  the  keypoint  location.  All  orientations  in  this  region 
are  then  rotated  relative  to  the  orientation  of  the  keypoint  to  provide  for  rotation 
invariance  of  the  descriptor. 

Once  the  preprocessing  of  the  magnitudes  and  orientations  has  been  performed, 
the  16x16  region  is  split  into  16  separate  4x4  regions.  For  each  subregion,  the 
magnitude  of  the  gradients  are  then  placed  into  histogram  with  8  bins  each  covering 
a  45°  range  of  orientation.  This  operation  is  depicted  in  Figure  4.5. 
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Figure  4.5  Gonstruction  of  a  descriptor  from  gradient  magnitudes  and  orientations 
surrounding  a  keypoint 
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Rather  than  the  2x2  descriptor  array  shown  above,  the  current  SIFT  imple¬ 
mentation  uses  a  4  X  4  descriptor  array  where  each  element  of  the  array  contains 
an  eight-bin  histogram.  This  results  in  a  128-dimensional  feature  vector.  This  vec¬ 
tor  is  normalized  to  unit  length  before  being  stored  in  a  database  of  keypoints  and 
their  descriptors  for  a  given  image.  Normalization  of  the  vector  helps  to  reduce 
the  effects  of  illumination  change  from  image  to  image.  Once  the  feature  vectors 
have  been  computed  for  two  adjacent  images,  the  current  implementation  of  SIFT 
employs  a  brute-force  search  to  find  the  minimum  distance  between  the  features  in 
the  reference  and  sensed  images.  This  is  currently  the  most  expensive  portion  of 
the  algorithm. 

4.3.8.  Results.  The  results  of  the  SIFT  algorithm  are  shown  graphically 
in  Figures  4.6  and  4.7.  The  30  points  with  smallest  minimum  distance  matched  in 
the  feature  space  of  both  images  are  shown.  The  numbers  in  the  images  correspond 
to  the  distance,  with  1  being  the  smallest  distance  and  30  being  the  largest.  Note 
that  points  17,  19,  and  29  are  erroneous.  The  registration  of  two  different  segments 
of  LAMD  images  is  shown  in  Figure  4.8.  The  first  segment  (a)  is  a  collection  of 
22  images.  The  second  (b)  is  a  collection  of  10  images.  As  can  be  seen  in  the  last 
hgures,  the  overall  registration  result  is  relatively  good  and  compares  well  with  the 
correlation-based  algorithm. 

4.3.9.  Further  Applications  of  SIFT.  The  SIFT  algorithm  has  been 
implemented  in  this  work  to  aid  in  the  task  of  automatic  control  point  selection 
specihcally  for  image  registration.  SIFT  features  can  be  readily  adapted  to  other 
applications,  such  as  object  recognition  and  change  detection.  In  [20],  Lowe  dis¬ 
cusses  the  use  of  the  Hough  transform  to  identify  clusters  which  belong  to  a  single 
object  and  then  performing  verihcation  of  object  determination  via  the  least-squares 
solution  for  consistent  pose  parameters.  This  allows  SIFT  features  to  be  used  to 
easily  and  consistently  identify  objects  among  clutter  and  occlusion  with  relatively 
little  computational  complexity  as  compared  to  other  algorithms  performing  this 
task. 
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Figure  4.6  SIFT  control-points  for  a  LAMD  sensed  image 


Figure  4.7  SIFT  control-points  for  a  LAMD  reference  image 
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5.  CONCLUSIONS  AND  FUTURE  WORK 


In  this  thesis  several  multiscale  methods  applied  to  the  important  processes  of 
anomaly  detection  and  image  registration  have  been  presented  and  discussed.  This 
work  has  shown  that  either  wavelet  domain  or  scale-space  methods  can  be  used  for 
these  applications  and  can  perform  as  well  or  better  than  the  baseline  algorithms, 
albeit  with  greater  computational  complexity. 

The  wavelet  transform  and  its  shift-invariant  counterpart  have  been  presented 
and  discussed.  The  use  of  the  shift-invariant  wavelet  transform  for  target  detec¬ 
tion  has  been  explored  using  two  different  architectures:  one  based  on  a  statistical 
test,  and  the  other  a  multilayer  perceptron  neural  network.  Results  in  the  form 
of  ROC  curves  have  been  use  to  characterize  performance  of  these  algorithms,  and 
this  performance  has  been  compared  to  the  baseline  detection  algorithm.  Future 
investigations  which  assess  the  performance  of  other  wavelet-based  detection  archi¬ 
tectures  presented  in  the  literature  could  possibly  validate  the  wavelet  transform  as 
an  invaluable  tool  for  anomaly  detection.  Further  studies  into  the  computational 
complexity  of  such  architectures  are  needed. 

The  use  of  scale-space  theory  has  been  applied  to  the  problem  of  image  registra¬ 
tion  through  the  scale-invariant  feature  transform.  The  steps  of  this  novel  algorithm 
have  been  outlined  in  detail,  and  results  of  the  application  of  the  algorithm  to  image 
registration  have  been  presented.  The  use  of  SIFT  features  for  change  detection  and 
object  recognition  is  very  likely  in  the  near  future. 


APPENDIX  A. 

C++  FUNCTION  PROTOTYPES  FOR  CORRELATION-BASED 
REGISTRATION  AND  RX  DETECTION 
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Included  in  this  appendix  are  C++  interface  files  containing  function  proto¬ 
types  for  correlation-based  registration  and  RX  single  and  multi-band  anomaly  de¬ 
tection.  This  code  was  developed  for  implementing  baseline  anomaly  detection  and 
image  registration  algorithms  on  groundstation  hardware  provided  by  the  Coun¬ 
termine  Division  of  the  Night  Vision  and  Electronic  Services  Directorate  of  the 
U.S.  Army.  There  are  three  interface  hies:  imageUtils  .h,  providing  basic  image 
processing  utilities,  detectionTools  .h,  providing  RX  processing  capability,  and 
registrationTools  .h,  providing  correlation-based  control  point  selection.  Fast 
Fourier  transforms  were  performed  using  the  open  source  FFTW  library. 


1.  imageUtils. h 

#ifndef  IMAGEUTILS_H 
#define  IMAGEUTILS_H 

/* 

*  imageUtils.h 

* 

*  Created  by  jbarnes  on  9/25/05. 

*  Jeff  Barnes 

*  jbarnes@umr.edu 

*  Copyright  2005 

*  Airborne  Reconnaissance  and  Image  Analysis  Laboratory 

*  University  of  Missouri  -  Rolla 

*  All  rights  reserved. 

* 

*/ 

#include  <iostream> 

#include  <cmath> 

#include  <vector> 

#include  <cstdlib> 

#include  <ctime> 

using  namespace  std; 

#include  " Image. h" 

#include  "fftwB.h" 


void  fft2(int  numRows, 
int  numCols, 
double  *inData_space , 
fftw_complex  *outData_f req) ; 
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/*  Description:  Forward  2D  Fast  Fourier  Transform 
* 

*  Inputs:  uumRows  -  number  of  rows 

*  numCols  -  number  of  columns 

*  *inData_space  -  pointer  to  spatial  domain  data 

* 

*  Output :  *outData_f req  -  pointer  to  complex  FFT  data 
*/ 

void  ifft2(int  fftRows, 
int  fftCols, 

fftw_complex  *inData_f req, 
double  *outData_space) ; 

/*  Description:  Inverse  (backward)  2D  Fast  Fourier  Transform 

* 

*  Inputs:  fftRows  -  number  of  rows  of  FFT  data 

*  (same  as  spatial  domain) 

*  fftCols  -  number  of  columns  of  FFT  data 

*  (floor (columns/2)  +  1  of  spatial 

*  domain  columns) 

*  *inData_freq  -  pointer  to  FFT  data 

* 

*  Output:  *outData_space  -  pointer  to  spatial  data 
*/ 

void  createBoxMask(int  height, 

int  width, 
double  *boxMask) ; 

/*  Description:  Create  a  flat  rectangular  mask  where 

*  all  pixel  values  sum  to  unity 

* 

*  Inputs:  height  -  number  of  rows  in  mask 

*  width  -  number  of  columns  in  mask 

* 

*  Output :  *boxMask  -  pointer  to  mask  data 
*/ 

void  createBoxMaskFFT(int  height, 

int  width, 
int  numRows, 
int  numCols, 

fftw_complex  *boxMaskFFT, 
int  rowCen=-l, 
int  ColCen=-l) ; 

/*  Description:  Create  FFT  spectrum  of  rectangular  mask 

*  where  the  mask  has  been  zero  padded 

* 

*  Inputs:  height  -  number  of  rows  in  spatial  mask 
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* 

* 

* 

* 

*  Output 
*/ 

void  createCircMaskCint  radius, 

double  *circMask) ; 

/*  Description:  Create  a  circular  mask  of  a  certain  radius  where 

*  all  values  sum  to  unity 

* 

*  Input :  radius  -  radius  of  the  mask 

* 

*  Output:  *circMask  -  pointer  to  mask  data 
*/ 

void  myfft2(int  height, 
int  width, 
int  numRows, 
int  numCols, 
double  * image, 
fftw_complex  *imageFFT, 
int  rowCen=-l, 
int  colCen=-l) ; 

/*  Description:  Forward  2D  FFT  with  shifted  spectrum 

*  (the  first  and  third  quadrants  and 

*  second  and  fourth  quadrants  are  swapped) 

* 

*  Inputs:  height  -  rows  of  original  data 

*  width  -  columns  of  original  data 

*  numRows  -  rows  of  data  to  be  transformed 

*  numCols  -  columns  of  data  to  be  transformed 

*  * image  -  pointer  to  image  data 

* 

*  Output:  *imageFFT  -  shifted  spectrum  of  input  image 
*/ 

void  complexMult (int  arraySize, 

fftw_complex  *inDatal, 
fftw_complex  *inData2, 
fftw_complex  *outData) ; 

/*  Description:  Multiplies  complex  data  of  type  "f ftw_complex" 

* 

*  Inputs:  arraySize  -  size  of  the  input  arrays  to  be  multiplied 

*  *inDatal  -  pointer  to  first  complex  array 

*  *inData2  -  pointer  to  second  complex  array 


width  -  number  of  columns  in  spatial  mask 
numRows  -  total  number  of  rows  in  image 
numCols  -  total  number  of  columns  in  image 

*boxMaskFFT  -  pointer  to  spectrum  of  mask 


71 


* 

*  Output:  *outData  -  pointer  to  result  of  multiplied  arrays 
*/ 

void  getRMSImage (int  numRows, 
int  numCols, 
double  * image, 
fftw_complex  *maskFFT, 
double  *RMSImage) ; 

/*  Description:  Calculates  RMS  data  for  a  given  image 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns 

*  * image  -  pointer  to  image  data 

*  *maskFFT  -  pointer  to  spectrum  of  square 

*  (or  circular)  flat  mask 

* 

*  Output:  *RMSImage  -  pointer  to  RMS  image  data 
*/ 

double  complexMag (double  realNum, 
double  imagNum) ; 

/*  Description:  Returns  the  magnitude  of  complex  number 

* 

*  Inputs:  realNum  -  real  part  of  complex  number 

*  imagNum  -  imaginary  part  of  complex  number 

* 

*  Output :  returns  magnitude  of  type  double 
*/ 

template<class  T> 

void  findMaxd  *arrayPtr, 

int  arraySize, 

T&  maxVal) ; 

/*  Description:  Finds  the  maximum  value  of  an  array 

* 

*  Inputs:  *arrayPtr  -  pointer  to  input  array 

*  arraySize  -  size  of  the  array 

* 

*  Output :  maxVal  -  the  maximum  value  of  the  array 
*/ 

template<class  T> 

void  findMind  *arrayPtr, 

int  arraySize, 

T&  minVal) ; 

/*  Description:  Finds  the  minimum  value  of  an  array 
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* 

*  Inputs:  *arrayPtr  -  pointer  to  input  array 

*  arraySize  -  size  of  the  array 

* 

*  Output :  minVal  -  the  minimum  value  of  the  array 
*/ 

void  fftWrite(int  fftRows, 
int  fftCols, 
fftw_complex  *fftData, 
string  dirPath, 
string  fileName) ; 

/*  Description:  Write  FFT  magnitude  data  to  file 

*  (currently  for  debugging  purposes) 

* 

*  Inputs:  fftRows  -  number  of  rows  in  FFT  data 

*  fftCols  -  number  of  columns  in  FFT  data 

*  *fftData  -  pointer  to  FFT  spectrum 

*  dirPath  -  path  of  directory  where  file  is  written 

*  fileName  -  filename  of  the  file  written 

* 

*  Output:  File  with  FFT  spectrum  magnitudes  written 

*  as  unsigned  integers  from  [0,65535] 

*/ 

void  spaceWrite (int  numRows, 
int  numCols, 
double  *data, 
string  dirPath, 
string  fileName) ; 

/*  Description:  Write  spatial  domain  data  to  file 

* 

*  Inputs :  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  *data  -  pointer  to  image  data 

*  dirPath  -  path  of  directory  where  file  is  written 

*  fileName  -  filename  of  the  file  written 

* 

*  Output:  File  with  spatial  data  written  as  unsigned 

*  integers  ranging  from  [0,65535] 

*/ 

void  nonmax (int  numRows, 
int  numCols, 
double  *imgData, 
double  *nonMaxImg, 
int  minDistX=16, 
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/* 

* 

* 

* 

* 

* 

* 

* 

*/ 


int  minDistY=16) ; 

Description:  Non-maximal  suppression  of  an  image 

Inputs:  numRows  -  number  of  rows  in  image 

numCols  -  number  of  columns  in  image 
*imgData  -  pointer  to  image  data 

Output:  *nonMaxImg  -  pointer  to  image  under 

non-maximal  suppression 


void  randpermC const  int  arraySize, 
int  *randArray) ; 

/*  Description:  Random  permutation  of  integers 

*  (Similar  functionality  to  MATLAB's  "randperm") 

* 

*  Input:  arraySize  -  size  of  the  random  array 

* 

*  Output:  *randArray  -  pointer  to  random  array 
*/ 


void  randn(int  arraySize, 

double  *normalRandArray) ; 

/*  Description:  Generates  array  of  Gaussian  random 

*  numbers  with  mean  zero  and  variance  one 

*  using  the  Box-Mueller  transformation  method 

*  (Similar  to  MATLAB's  "randn") 

* 

*  Input:  arraySize  -  size  of  random  array 

* 

*  Output:  *normalRandArray  -  pointer  to  random  array 
*/ 

void  mysort (vector<double>  doublevecin, 
vector<double>  fedoublevec, 
vector<int>  feidxvec) ; 

/*  Description:  Sorts  data  in  a  vector  by  smallest  value  ascending 

*  (double  precision  data) 

* 

*  Input :  doublevecin  -  input  vector  (unsorted) 

* 

*  Outputs:  doublevec  -  sorted  vector 

*  idxvec  -  indices  of  original  data  in  sorted  vector 
*/ 


template<class  T> 
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void  swap_vals(T&  vl, 

T&  v2) ; 

/*  Description:  Swap  the  values  of  two  arguments 
* 

*  Inputs:  vl  -  first  value 

*  v2  -  second  value 

* 

*  Outputs:  vl  -  second  value 

*  v2  -  first  value 
*/ 


int 


/* 

* 

* 

* 

* 

* 

* 

* 

*/ 


index_of _smallest (vector<double>  invec , 
int  start_index, 
int  vecsize) ; 

Description:  Finds  the  index  of  the  smallest  value  of  array 

Inputs:  invec  -  input  vector 

start_index  -  starting  index  from  which  to 
find  the  smallest  value 
vecsize  -  size  of  input  vector 

Output:  integer  representing  index  of  smallest  value 


2.  detectionTools.h 


#ifndef  DETECTI0NT00LS_H 
#define  DETECTI0NT00LS_H 
/* 

*  detectionTools.h 

* 

*  Created  by  jbarnes  on  9/25/05. 

*  Jeff  Barnes 

*  jbarnes@umr.edu 

*  Copyright  2005 

*  Airborne  Reconnaissance  and  Image  Analysis  Laboratory 

*  University  of  Missouri  -  Rolla 

*  All  rights  reserved. 

* 

*/ 

#include<iostream> 

#include<string> 

#include<cmath> 


using  namespace  std; 
#include  "fftwB.h" 

#include  "imageUtils .h" 
#include  "rx_statistics .h" 


75 


struct  targetData 

{ 

vector<int>  rowLocs; 
vector<int>  colLocs; 
vector<double>  vals; 

}; 

/*  Description:  Structure  containing  pixel  locations 

*  of  anomalies  and  their  likelihood  values 
*/ 

void  singleBandRX(int  numRows, 
int  numCols, 
fftw_complex  *tMaskFFT, 
fftw_complex  *mMaskFFT, 
fftw_complex  *cMaskFFT, 
double  * input Img,  double  *rxlmg) ; 

/*  Description:  RX  algorithm  for  single-band  imagery 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  *tMaskFFT  -  pointer  to  target  mask  spectrum 

*  *mMaskFFT  -  pointer  to  demeaning  mask  spectrum 

*  *cMaskFFT  -  pointer  to  covariance  mask  spectrum 

*  * input Img  -  pointer  to  image  data 

* 

*  Output :  *rxlmg  -  pointer  to  RX  algorithm  output  image 
*/ 

void  mult iBandRX( int  numRows, 
int  numCols, 
int  numBands, 
fftw_complex  *tMaskFFT, 
fftw_complex  *mMaskFFT, 
fftw_complex  *cMaskFFT, 
double  *multiBandImg, 
double  *rxlmg) ; 

/*  Description:  RX  algorithm  for  multispectral  imagery 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  numBands  -  number  of  bands  in  image 

*  *tMaskFFT  -  pointer  to  target  mask  spectrum 

*  *mMaskFFT  -  pointer  to  demeaning  mask  spectrum 

*  *cMaskFFT  -  pointer  to  covariance  mask  spectrum 

*  *multiBandImg  -  pointer  to  image  data 

* 
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*  Output :  *rxlmg  -  pointer  to  RX  algorithm  output  image 

*/ 

void  multiComplexMult (int  inSizel, 

int  inSize2, 
fftw_complex  *inl, 
fftw_complex  *in2, 
fftw_complex  *out) ; 

/*  Description:  Complex  multiplication  of  a  single  array 

*  of  size  inSizel  by  an  array  of  size  inSize2 

*  where  inSizel  <  inSize2  and  inSizel  evenly 

*  divides  inSize2 

* 

*  Inputs:  inSizel  -  size  of  the  first  array 

*  inSize2  -  size  of  the  second  array 

*  *inl  -  pointer  to  first  array  of  complex  data 

*  *in2  -  pointer  to  second  array  of  complex  data 

* 

*  Output:  *out  -  pointer  to  output  result 

*/ 

void  createRXMasksFFT(int  targetRadius , 

int  demeaningRadius , 
int  blankingRadius , 
int  numRows, 
int  numCols, 
fftw_complex  ^tMaskFFT, 
fftw_complex  *mMaskFFT, 
fftw_complex  *cMaskFFT) ; 

/*  Description:  Create  RX  masks  in  the  frequency  domain 

* 

*  Inputs:  targetRadius  -  radius  of  target  mask 

*  demeaningRadius  -  radius  of  demeaning  maskk 

*  blankingRadius  -  radial  width  of  blanking  region 

*  numRows  -  rows  of  data  on  which  RX  operates 

*  numCols  -  columns  of  data  on  which  RX  operates 

* 

*  Outputs:  *tMaskFFT  -  pointer  to  target  mask  spectrum 

*  *mMaskFFT  -  pointer  to  demeaning  mask  spectrum 

*  *cMaskFFT  -  pointer  to  covariance  data 

*/ 

void  multiFFT2 (int  numRows, 
int  numCols, 
int  howmany, 
double  *input_space , 
fftw_complex  *output_freq) ; 
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/*  Description:  Performs  forward  FFTs  on  a  batch  of  images 

*  (used  to  gather  target  and  covariance 

*  data  for  multispectral  RX) 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  howmany  -  number  of  images  in  batch 

*  *input_space  -  pointer  to  spatial  domain 

*  input  imagery 

* 

*  Output:  *output_freq  -  pointer  to  output  spectra 
*/ 

void  multiIFFT2 (int  numRows, 
int  numCols, 
int  howmany, 

fftw_complex  *input_freq, 
double  * output _ space) ; 

/*  Description:  Performs  inverse  FFTs  on  a  batch  of  spectra 

*  (used  for  multispectral  RX) 

* 

*  Inputs:  numRows  -  number  of  rows  of  FFT  data 

*  numCols  -  number  of  columns  of  FFT  data 

*  howmany  -  number  of  spectra  in  batch 

*  *input_freq  -  pointer  to  spectra 

* 

*  Output:  *output_space  -  pointer  to  spatial  domain  data 
*/ 

void  calcRXstats (int  numRows, 
int  numCols, 
int  numBands, 
double  *targData, 
double  *covData, 
double  *rxlmg) ; 

/*  Description:  Calculate  RX  statistics  once  the 

*  target  and  covariance  data  have  been 

*  calculated  (multispectral  RX) 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  numBands  -  number  of  bands  in  image 

*  *targData  -  pointer  to  target -mean  data 

*  *covData  -  pointer  to  covariance  data 

* 

*  Output :  *rxlmg  -  RX  algorithm  output 
*/ 
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void  getTargetLocs (int  numRows, 

int  numCols, 
double  *rxlmg, 
targetData  fetdata, 
double  threshold  =  2.0, 
int  numLocs  =  20) ; 

/*  Description:  Gather  target  locations  from  RX  output 
* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  *rxlmg  -  pointer  to  RX  output  data 

* 

*  Output :  tdata  -  structure  containing  row  and 

*  column  locations  and  likelihood 

*  values  of  anomalies 
*/ 

#endif 

3.  registrationTools.h 

#ifndef  REGISTRATI0NTD0LS_H 
#define  REGISTRATI0NTD0LS_H 

/* 

*  registrationTools_l .h 

* 

*  Created  by  jbarnes  on  9/25/05. 

*  Jeff  Barnes 

*  jbarnes@umr.edu 

*  Copyright  2005 

*  Airborne  Reconnaissance  and  Image  Analysis  Laboratory 

*  University  of  Missouri  -  Rolla 

*  All  rights  reserved. 

* 

*/ 

#include  <iostream> 

#include  <cmath> 

#include  <vector> 

using  namespace  std; 

#include  " Image. h" 

#include  "fftwB.h" 


struct  controlPointData 

{ 
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vector<int>  rowLocsl; 
vector<double>  rowLocs2; 
vector<int>  colLocsl; 
vector<double>  colLocs2; 
vector<double>  vals; 


}; 

/*  Description:  Structure  containing  control  points 

*  in  both  the  reference  and  sensed  images 

*  along  with  correlation  values 
*/ 


void  findCorrsForlmg (double  *lastlmg, 

double  *img, 

controlPointData  fecpdata, 
int  maskHeight=32 , 
int  maskWidth=32) ; 

Find  correspondences  between  a  reference 
and  sensed  image  using  correlation-based 
method 


/* 

* 

* 

* 

* 

* 

* 

* 

*/ 


Description: 


Inputs:  *lastlmg  -  pointer  to  reference  image 

*img  -  pointer  to  sensed  image 

Output :  cpdata  -  control  point  data  structure 


Description: 


template<class  T> 

void  cleanControlPointData(controlPointData  fecpdata, 

T  minDist) ; 

If  two  control  points  are  found  in  the  control 
point  data  and  are  less  than  minDist 
pixel  values  in  distance,  then  the  control 
points  with  the  lesser  correlation  values 
are  removed  from  the  list  of  control  points. 
The  control  points  are  then  sorted  from 
maximum  correlation  value  descending. 


/* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

*/ 


Inputs:  cpdata  -  structure  containing  control  point  data 

minDist  -  minimum  distance  between  control  points 

Output:  cpdata  -  structure  containing  control  points 

sorted  from  maximum  correlation  descending 


#endif 


APPENDIX  B. 

C++  FUNCTION  PROTOTYPES  FOR  SCALE-INVARIANT 


FEATURE  TRANSFORM 
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This  appendix  contains  the  interface  file  SIFTutils.h.  This  file  includes  the 
function  prototypes  for  the  SIFT  registration  algorithm. 

#ifndef  SIFTUTILS_H 
#define  SIFTUTILS_H 

/* 

*  SIFTutils.h 

* 

*  Created  by  jbarnes  on  10/4/05. 

*  Jeff  Barnes 

*  jbarnes@umr.edu 

*  Copyright  2005 

*  Airborne  Reconnaissance  and  Image  Analysis  Laboratory 

*  University  of  Missouri  -  Rolla 

*  All  rights  reserved. 

* 

*/ 

#include<iostream> 

#include<vector> 

#include  " image. h" 

using  std: : vector; 

struct  extrema 

{ 

vector<int>  rows; 
vector<int>  cols; 
vector<int>  scale; 
vector<int>  octave; 

}; 

/*  Description:  Structure  containing  extrema  locations 
*/ 

struct  peaks 

{ 

vector<int>  rows; 
vector<int>  cols; 
vector<double>  vals; 

}; 

/*  Description:  Structure  containing  peaks 
*  found  through  extrema  detection 

*/ 

struct  keypoints 

{ 
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vector<double>  rows; 
vector<double>  cols; 
vector<double>  scale; 
vector<int>  octave; 
vector<double>  orientations; 

}; 

/*  Description:  Structure  of  keypoint  data  containing 

*  row  location,  column  location,  scale, 

*  octave  and  orientation 
*/ 

struct  controlPointData_SIFT 

{ 

vector<double>  rowLocsl; 
vector<double>  rowLocs2; 
vector<double>  colLocsl; 
vector<double>  colLocs2; 
vector<double>  vals; 

}; 

/*  Description:  Structure  with  control  point  data 

*  including  row  and  column  locations 

*  in  the  reference  and  sensed  images 

*  and  the  value  of  the  distance 

*  between  their  keypoint  descriptors 

*  in  feature  space 
*/ 

void  imageExpand(int  numRows_orig, 
int  numCols_orig, 
double  *img, 
double  *expandedlmg) ; 

/*  Description:  Image  expansion  via  bilinear  interpolation 

* 

*  Inputs:  numRows_orig  -  number  of  rows  in  original  image 

*  numCols_orig  -  number  of  columns  in  original  image 

*  *img  -  pointer  to  image  data 

* 

*  Output :  *expandedlmg  -  pointer  to  expanded  image  data 
*/ 

void  convGaussianFast (int  numRows, 

int  numCols, 
double  * input Img, 
double  *smoothedImg, 
double  sigma) ; 

/*  Description:  Fast  2D  convolution  with  separable 

*  Gaussian  kernels 
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* 

* 

* 

* 

* 

* 

* 

*/ 


Inputs : 


Output : 


numRows  -  number  of  rows  in  image 
numCols  -  number  of  columns  in  image 
* input Img  -  pointer  to  input  image 
sigma  -  standard  deviation  of  kernel 

*smoothedImg  -  result  from  convolution 


void  buildFirstOctave (int  numRows, 

int  numCols , 
int  s , 

double  *inputlmg, 
double  *dogOctave, 
double  *nextDctaveStart , 
double  *gausslmages) ; 

Description:  Build  the  first  octave  of  scale-space 

and  dif f erence-of-Gaussian  images 


/* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

*/ 


Inputs:  numRows  -  number  of  rows  in  image 

numCols  -  number  of  columns  in  image 
s  -  Lowe's  parameter  "s" 

* input Img  -  pointer  to  input  image 

Outputs:  *dogOctave  -  octave  of  DoG  images 

*nextOctaveStart  -  pointer  to  data  used  to  build  next  octave 
*gausslmages  -  octave  of  scale-space 


void  buildOctave (int  numRows, 
int  numCols, 
int  s , 

int  octaveNum, 
double  *prevDctaveStart , 
double  *dogDctave, 
double  *nextDctaveStart , 
double  *gausslmages) ; 

/*  Description:  Build  subsequent  octaves  after  the  first 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  s  -  Lowe's  parameter  "s" 

*  octaveNum  -  number  of  the  octave  to  build 

*  *prevOctaveStart  -  pointer  of  data  used  to  build  octave 

* 

*  Outputs:  *dogOctave  -  dif f erence-of-Gaussian  images  for  octave 

*  *nextOctaveStart  -  pointer  to  data  used  to  build  next  octave 
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*  *gausslmages  -  pointer  to  scale-space  for  octave 
*/ 

void  buildPyramid(int  numRows, 
int  numCols, 
int  s , 

int  totalOctaves , 
double  * input Img, 
double  ^pyramid, 
double  *gausslmages) ; 

/*  Description:  Build  complete  scale-space  and  DoG  pyramids 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  s  -  Lowe's  parameter  "s" 

*  totalOctaves  -  number  of  octaves  in  pyramid 

*  *inputlmage  -  pointer  to  original  input  image 

* 

*  Outputs:  ^pyramid  -  pointer  to  DoG  pyramidd 

*  *gausslmages  -  pointer  to  scale-space  images 
*/ 

void  f indExtrema(int  numRows, 
int  numCols, 
int  s , 

int  totalOctaves, 
double  ^pyramid, 
extrema  fedogExtrema) ; 

/*  Description:  Find  extrema  in  DoG  images  with  check 

*  of  nearest  neighbors 

* 

*  Inputs:  numRows  -  number  of  rows  in  image 

*  numCols  -  number  of  columns  in  image 

*  s  -  scale  parameter 

*  totalOctaves  -  total  number  of  octaves  in  pyramid 

*  ^pyramid  -  pointer  to  DoG  pyramid 

* 

*  Output :  dogExtrema  -  structure  with  extrema  in  DoG  images 
*/ 

void  f indMaxima(int  numRows, 
int  numCols, 
double  *img, 
peaks  fepeakList) ; 

/*  Description:  Find  maxima  in  DoG  plane  of  pyramid 

* 

*  Inputs:  numRows  -  number  of  rows  in  octave 
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* 

* 

* 

*  Output : 

* 

*/ 

void  f indMinima(int  numRows, 
int  numCols, 
double  *img, 
peaks  fepeakList) ; 

/*  Description:  Find  minima  in  DoG  plane  of  pyramid 
* 

*  Inputs:  numRows  -  number  of  rows  in  octave 

*  numCols  -  number  of  columns  in  octave 

*  *img  -  pointer  to  plane  in  DoG  images 

* 

*  Output :  peakList  -  structure  containing  peaks  in 

*  a  plane  of  DoG  images 
*/ 

void  ref ineMax(int  rows, 
int  cols, 
int  octave, 
int  scale, 
double  *sclPtr, 

peaks  &peakList ,  extrema  fepyrExtrema) ; 

/*  Descriptions:  Checks  scales  adjacent  to  the  scale  of  the 

*  extremum  for  values  greater  than  the 

*  values  in  peakList 

* 

*  Inputs:  rows  -  number  of  rows  in  octave 

*  cols  -  number  of  columns  in  octave 

*  octave  -  current  octave 

*  scale  -  current  scale 

*  *sclPtr  -  pointer  to  plane  of  scale  space 

*  peakList  -  list  of  possible  extrema 

* 

*  Output :  pyrExtrema  -  extrema  that  survive  are  placed 

*  in  this  structure 
*/ 

void  ref ineMin(int  rows, 
int  cols, 
int  ocatve, 
int  scale, 
double  *sclPtr, 


numCols  -  number  of  columns  in  octave 
*img  -  pointer  to  plane  in  DoG  images 

peakList  -  structure  containing  peaks  in 
a  plane  of  DoG  images 
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peaks  fepeakList , 
extrema  fepyrExtrema) ; 

/*  Description:  Similar  to  "refineMax",  except 

*  the  check  is  for  minima 
*/ 

void  ref ineExtrema(int  nRows, 

int  nCols, 
int  s , 

int  numOctaves, 
double  ^pyramid, 
extrema  fepyrExtrema, 
keypoints  fekeyData) ; 

/*  Description:  Refines  the  extrema  by  interpolating  their 

*  values  and  checking  if  it  is  above  a  certain 

*  threshold  in  the  DoG  images .  Extrema  which 

*  have  a  large  ratio  of  principal  curvatures 

*  are  also  discarded. 

* 

*  Inputs:  nRows  -  number  of  rows  in  image 

*  nCols  -  number  of  columns  in  image 

*  s  -  scale  parameter 

*  numOctaves  -  total  number  of  octaves 

*  ^pyramid  -  pointer  to  DoG  pyramid  data 

*  pyrExtrema  -  structure  containing  list  of  extrema 

* 

*  Output:  keyData  -  structure  containing  keypoints  for  image 
*/ 

void  getHessianData(int  sclRows, 

int  sclCols, 
double  *ptr, 
double  xy[3]  [3] , 
double  xSigma[3]  [3] , 
double  ySigma[3]  [3]); 

/*  Description:  Gather  data  in  DoG  pyramid  needed 

*  calculate  the  3x3  Hessian  matrix 

* 

*  Inputs:  sclRows  -  number  of  rows  in  octave 

*  sclCols  -  number  of  columns  in  octave 

*  *ptr  -  pointer  to  singular  location  in  DoG  pyramid 

* 

*  Outputs:  xy  -  3  X  3  array  of  data  in  x-y  direction 

*  (x-y  ~  column-row) 

*  xSigma  -3x3  array  of  data  in  x-sigma  direction 

*  ySigma  -3x3  array  of  data  in  y-sigma  direction 

*/ 
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void  calcHessian (double  xy[3] [3] , 

double  xSigma[3] [3] , 
double  ySigma[3] [3]  , 
double  hessiau[3] [3] , 
double  derivative [3] ) ; 

/*  Descriptiou:  Calculates  the  Hessiau  data  from 

*  the  data  gathered  iu  getHessiauData 

* 

*  luputs:  xy  -  3  X  3  array  of  data  iu  x-y  directiou 

*  xSigma  -3x3  array  of  data  iu  x-sigma  directiou 

*  ySigma  -3x3  array  of  data  iu  y-sigma  directiou 

* 

*  Outputs:  hessiau  -3x3  array  represeutiug  Hessiau 

*  matrix  at  specific  locatiou 

*  derivate  -  leugth  3  row  vector  of  derivatives 

*  of  DoG  data  w.r.t.  x,  y,  aud  sigma 
*/ 

void  calcOff set (double  hessiau [3] [3] , 
double  derivatve [3] , 
double  offset [3]); 

/*  Descriptiou:  Calculate  the  iuterpolated  offset  from 

*  the  Hessiau  aud  derivative  data 

* 

*  luputs:  hessiau  -3x3  array  represeutiug  matrix  of 

*  Hessiau  data 

*  derivative  -  leugth  3  row  vector  f  derivatives 

* 

*  Output:  offset  -  leugth  3  row  vector  of  offsets  iu  the 

*  X,  y,  aud  sigma  directious 
*/ 

void  compMagAudOrieut (iut  uumRows, 

iut  uumCols, 
iut  s , 

iut  totalOctaves , 
double  *gausslmages , 
double  *maguitudes, 
double  *orieutatious) ; 

/*  Descriptiou:  Precompute  maguitudes  aud  orieutatious 

*  over  all  plaues  of  scale-space 

* 

*  luputs:  uumRows  -  uumber  of  rows  iu  origiual  image 

*  uumCols  -  uumber  of  columus  iu  origiual  image 

*  s  -  scale  parameter 

*  totalOctaves  -  total  uumber  of  octaves  iu  scale-space 


*gausslmages  -  pointer  to  scale-space 


* 

* 

*  Outputs:  ^magnitudes  -  pointer  to  magnitude  data 

*  ^orientations  -pointer  to  orientation  data 
*/ 

void  compMagnitudes (int  numRows, 

int  numCols, 
double  *sclPtr, 
double  *magPtr) ; 

/*  Description:  Compute  magnitude  data  from  single 

*  plane  of  scale-space 

* 

*  Inputs:  numRows  -  number  of  rows  in  octave 

*  numCols  -  number  of  columns  in  octave 

*  *sclPtr  -  pointer  to  plane  of  scale-space 

* 

*  Output :  *magPtr  -  pointer  to  plane  of  magnitudes 
*/ 

void  compOrientations (int  numRows, 

int  numCols, 
double  *sclPtr, 
double  *orPtr) ; 

/*  Description:  Similar  to  "compMagnitudes"  except 

*  calculation  of  orientations 
*/ 

void  assignOrientations(int  numRows, 

int  numCols, 
int  s , 

int  totalOctaves , 
keypoints  fekeyData, 
keypoints  &keyData2, 
double  ^magnitudes, 
double  ^orientations) ; 

/*  Description:  Assigns  an  orientation  to  a  keypoint 

* 

*  Inputs:  numRows  -  number  of  rows  in  original  image 

*  numCols  -  number  of  columns  in  original  image 

*  s  -  scale  parameter 

*  totalOctaves  -  total  number  of  octaves 

*  keyData  -  structure  with  keypoints 

*  ^magnitudes  -  pointer  to  magnitude  data 

*  ^orientations  -  pointer  to  orientation  data 

* 

*  Output:  keyData  -  orientation  vector  of  keyData  structure 
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*  is  computed  and  filled 

*  keyData2  -  any  keypoints  with  two  orientations 

*  have  another  keypoint  created  within 

*  this  structure 
*/ 

void  fitPolynomialMax (double  abscissa [3] , 

double  poly  [3] , 
double  femaxAbscissa) ; 

/*  Description:  Interpolates  the  maximum  location 

*  of  the  histogram  during  orientation 

*  assignment 

* 

*  Inputs:  abscissa  -  size  3  array  of  abscissas 

*  poly  “  size  3  array  of  values  at  the  abscissas 

* 

*  Output:  maxAbscissa  -  interpolated  peak  abscissa 
*/ 

void  mergeKeypoints (keypoints  fekeyDatal, 

keypoints  keyData2) ; 

/*  Description:  Merges  the  keypoints  in  keyData2 

*  with  the  keypoints  in  keyDatal . 

*  All  keypoints  are  kept  in  keyDatal. 

*/ 

void  buildDescriptors (int  numRows, 

int  numCols , 
int  s , 

int  totalOctaves , 
keypoints  fekeyData, 
double  ^magnitudes , 
double  ^orientations , 
double  ^descriptors) ; 

/*  Description:  Builds  the  feature-space  descriptors 

*  from  the  keypoint  data  and  magnitudes 

*  and  orientation  data 

* 

*  Inputs:  numRows  -  number  of  rows  in  original  image 

*  numCols  -  number  of  columns  in  original  image 

*  s  -  scale  parameter 

*  totalOctaves  -  total  number  of  octaves 

*  keyData  -  structure  containing  keypoint  data 

*  ^magnitudes  -  pointer  to  magnitude  data 

*  ^orientations  -  pointer  to  orientation  data 

* 

*  Output :  ^descriptors  -  pointer  to  array  of  keypoint  descriptors 
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*/ 

void  getDescriptor (double  *chipMag, 

double  *chipOr, 
double  ^descriptor) ; 

/*  Description:  Construct  the  descriptor  from  chips 

*  extracted  from  the  magnitude  and 

*  orientation  data 

* 

*  Inputs:  *chipMag  -  pointer  to  magnitude  chip  (16  x  16) 

*  *chipOr  -  pointer  to  orientation  chip  (16  x  16) 

* 

*  Output:  ^descriptor  -  pointer  to  descriptor  (1  x  128) 

*/ 

void  f indMatchingPoints (keypoints  keyData_l, 

keypoints  keyData_2, 
double  *descriptors_l , 
double  *descriptors_2 , 
controlPointData_SIFT  fecpData) ; 

/*  Description:  Finds  matching  points  in  feature  space  via 

*  brute  force  search 

* 

*  Inputs:  keyData_l  -  keypoints  from  the  reference  image 

*  keyData_2  -  keypoints  from  the  sensed  image 

*  *descriptors_l  -  reference  image  descriptors 

*  *descriptors_2  -  sensed  image  descriptors 

* 

*  Output:  cpData  -  structure  containing  control  point  data 
*/ 

void  SIFT(int  nRows, 
int  nCols, 
double  *currData, 
double  *lastData, 
controlPointData_SIFT  fecpData) ; 

/*  Description:  SIFT  algorithm  for  registration 

* 

*  Inputs:  nRows  -  number  of  rows  in  input  images 

*  nCols  -  number  of  columns  in  imput  images 

*  *currData  -  pointer  to  sensed  image 

*  *lastData  -  pointer  to  reference  image 

* 

*  Output:  cpData  -  structure  containing  control  point  data 
*/ 


#endif 
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